Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Induction Variable Simplification Pass promotes i32 operations to i64 #21522

Closed
tstellar opened this issue Oct 3, 2014 · 9 comments
Closed

Induction Variable Simplification Pass promotes i32 operations to i64 #21522

tstellar opened this issue Oct 3, 2014 · 9 comments
Assignees
Labels
bugzilla

Comments

@tstellar
Copy link
Collaborator

@tstellar tstellar commented Oct 3, 2014

Bugzilla Link 21148
Resolution FIXED
Resolved on Mar 24, 2016 20:17
Version trunk
OS Linux
Attachments Test Case, Test case with only one pointer
CC @atrick,@hfinkel,@arsenm,@irishrover

Extended Description

I've run into the issue where the Induction Variable Simplification Pass will
promote i32 operations to i64, which negatively impacts performance on AMDGPU targets.

The problem occurs when the induction variable is used for address calculations of pointers with different sizes. For example:

kernel void test(global int *out, int a) {

private int array[16];
...
for (int i = 0; i < 16; i++)
out[i * 3 + a] = array[i];
}

Here global pointers are 64-bit and private pointers are 32-bit. In this case the indvars pass tries to promote the induction variable to 64 bits which causes the address calculations for the out pointer to be 64-bit operations.

I've attached an LLVM IR test case the demonstrates the issue.

@tstellar
Copy link
Collaborator Author

@tstellar tstellar commented Oct 3, 2014

assigned to @tstellar

@atrick
Copy link
Contributor

@atrick atrick commented Oct 25, 2014

Sorry, I meant to respont to this long ago.

If you only really care about 64 vs 32 bit multiply, I think this can
be fixed without a target hook, because it will be generally good for
any architecture.

In isSimpleIVUser, just check if the SCEV expression is a multiply with more than one non-constant operand.

Just run LLVM test suite on x86 to verify no significant regressions.

I hope that works...

@llvmbot
Copy link
Collaborator

@llvmbot llvmbot commented Oct 27, 2014

Hi Andrew,

I think checking if the SCEV is a multiply would work for my cases, and seems to work for Tom's example. It can be a quick work around.

However, the general problem behind the scene still remains in the long term. Just think about a slight variation of my example:

attribute((global)) void foo(int n, int *output) {
for (int i = 0; i < n; i += 3) {
output[i + 2] = i + 4;
}
}

"i" is not involved in any multiply, but widening it to i64 is still a bad idea for the NVPTX backend. We may still want the target-hook approach in a long term.

Jingyue

@atrick
Copy link
Contributor

@atrick atrick commented Oct 27, 2014

If your target never wants to widen induction variables, then a target hook makes perfect sense.

Checking for multiply by non-constant makes sense in general.

They're both good fixes.

@llvmbot
Copy link
Collaborator

@llvmbot llvmbot commented Oct 27, 2014

Agreed. Are you sending out the patch soon? Sounds like it's ready already. Thanks for working on this!

@atrick
Copy link
Contributor

@atrick atrick commented Oct 27, 2014

No, I don't have a patch. I think either fix would be easy though.

@tstellar
Copy link
Collaborator Author

@tstellar tstellar commented Oct 28, 2014

If your target never wants to widen induction variables, then a target hook
makes perfect sense.

I would prefer adding this target hook, because I don't think we ever want to widen the induction variable for R600.

I'm curious though, in what scenarios is it beneficial to widen induction variables?

Checking for multiply by non-constant makes sense in general.

They're both good fixes.

@llvmbot
Copy link
Collaborator

@llvmbot llvmbot commented Nov 13, 2014

Hi Tom,

r221799 fixed the issue on the NVPTX side by having IndVarSimplify check the cost of the widen operation. Are you going to take it over and implement the getArithmeticInstrCost interface in AMDGPUTargetTransformInfo? I would be happy to do that but I don't have specific knowledge on the cost model of AMDGPU.

Thanks,
Jingyue

@arsenm
Copy link
Contributor

@arsenm arsenm commented Mar 25, 2016

Should be fixed by r264376

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 9, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla
Projects
None yet
Development

No branches or pull requests

4 participants