-
Notifications
You must be signed in to change notification settings - Fork 11.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent Output at -O1 and -O2 Optimization Levels on PowerPC64 Due to Complex Type Casting and Nested Loop Structure #71030
Comments
@llvm/issue-subscribers-backend-powerpc Author: None (gyuminb)
### **Description:**
The Proof-of-Concept (PoC) code provided demonstrates an inconsistency in the computed results of an Computed Result (ULL): ffffffffffffffff
Computed Result (ULL): ffffffff
Computed Result (ULL): ffffffff
Computed Result (ULL): ffffffffffffffff Environment:
PoC:#include <stdio.h>
// Define a macro to find the minimum value between two numbers
#define MIN(a,b) \
({ __typeof__ (a) _a = (a); \
__typeof__ (b) _b = (b); \
_a < _b ? _a : _b; })
// Global variables
short globalShortValue = (short)1;
signed char globalCharValue = (signed char)0;
unsigned long long largeNumber = 14782061517590169264ULL;
int someIntValue = 378441747;
unsigned int unitIncrement = 1U;
// Variables to store the results of computations
unsigned long long computedResultUll = 0ULL;
short computedResultShort = (short)0;
unsigned char computedResultUChar = (unsigned char)0;
_Bool computedResultBool = (_Bool)0;
unsigned char computedResultChar = (unsigned char)0;
// Arrays used in computations
short shortArray[8];
long long int longArray[8][8];
int intArray[8][8][8];
unsigned long long ullArray[8];
signed char charArray[8][8][8];
short resultArray[8][8];
// Initialize arrays
void initializeArrays() {
for (size_t i = 0; i < 8; ++i) {
shortArray[i] = (short)-1;
ullArray[i] = 1;
for (size_t j = 0; j < 8; ++j) {
longArray[i][j] = 0LL;
resultArray[i][j] = (short)1;
for (size_t k = 0; k < 8; ++k) {
intArray[i][j][k] = 0;
charArray[i][j][k] = (signed char)0;
}
}
}
}
int main() {
initializeArrays();
// Main loop for computations
for (short index = 3; index < ((int) (short) largeNumber) - 1705/*7*/; index += 4) {
computedResultUll = (unsigned long long) ((int) MIN(globalShortValue, shortArray[index])); // Potential issue here
for (int i = 0; i < 8; i++) {
for (signed char j = ((int) (signed char) someIntValue) - 19/*0*/; j < 8; j += 4) {
for (long long int k = 2; k < 4; k += 4) {
computedResultShort -= (short) unitIncrement;
computedResultUChar = (unsigned char) (_Bool) MIN((short) globalCharValue, shortArray[index - 1]);
charArray[2][0][index] = (signed char) globalShortValue;
resultArray[0][0] &= (short) longArray[0][j];
}
for (int l = 1; l < 7; l++) {
computedResultBool = (_Bool) (ullArray[index]);
computedResultChar -= (unsigned char) intArray[0][index][j];
}
}
}
}
// Print the result
printf("Computed Result (ULL): %llx\n", computedResultUll);
} Expected Behavior:Regardless of the optimization level, the value of Observed Behavior:When compiled with Clang-18 under Computed Result (ULL): ffffffffffffffff
Computed Result (ULL): ffffffff
Computed Result (ULL): ffffffff
Computed Result (ULL): ffffffffffffffff Analysis:The inconsistency is identified under the following conditions:
These conditions are specific and intricate, but the inconsistency is notable and is not attributed to Undefined Behavior. Steps to Reproduce:
Conclusion:The observed inconsistency in extending values to |
I've taken a quick look at the assembly for this test. For O2 we produce the following code to compute the MIN for
For O3 we produce the following code:
For O1
The bottom line is that for |
Looks like we are changing LHAX to LWZ in the register allocator. Is this an attempt to reduce live range?
Should that be an STD and LD instead of STW and LWZ? |
@gyuminb Did you check that the code is UB free using UBSan? Seems like
|
Yes, when I checked the code using the -fsanitize=undefined option, there were no instances of undefined behavior (UB). |
This is a bug in PPCMIPeepholes. It is quite subtle, but a bug nonetheless. This is probably why it requires all the complexity in the test case. Here's the gist of it:
Ultimately, we have to remove all the In order to prevent lost opportunities to remove redundant sign-extend instructions, perhaps the peephole can be modified to look at the use of instructions that sign-extend to 32-bits and if all the uses will then sign-extend, then convert the instruction to a sign-extend to 64-bits and update the uses to 64-bit uses. But that's a bit more involved. |
I compare the asm code. the different is at when it load the value from stack , it only load 4 bytes
the correct one:
|
The bug only happen in 64bit mode. In the PPCMIPeepholes optimization , if there are a instruction
LHA will be lower to
2 . the instruction defined the register RS which is used by for example:
in the scenario, the the code can be optimize to
But in some special situation, there is the problem when there is spill happen. In the example 1 (which is snippet code of a function in 64bit mode, which has spill on r4),
will change to following code after spill. it spill r4 into memory (4 bytes) with
to fix the problem , we need to promote the LHA to LHA8 which is 64-bit REGISTER
will be changed to
if there is a spill between the
in example 2:
we do not know when the spill will be happen ,All these instructions in the chain used to deduce sign extension to eliminate the 'extsw' will need to be promoted to 64-bit pseudo instructions. We need to promote the |
Add pre-commit MIR test for PR "[Promote Pseudo Opcode from 32-bit to 64-bit after eliminating the extsw instruction in PPCMIPeepholes optimization](#85451)" which fixes bug reported in the issue "[Inconsistent Output at -O1 and -O2 Optimization Levels on PowerPC64 Due to Complex Type Casting and Nested Loop Structure](#71030)".
Description:
The Proof-of-Concept (PoC) code provided demonstrates an inconsistency in the computed results of an
unsigned long long int
variable when compiled using Clang-18 for the PowerPC64 architecture. The discrepancy is observed specifically under the optimization levels-O1
and-O2
. The output ofcomputedResultUll
displays inconsistency as shown below:Environment:
O1
andO2
optimization levels.PoC:
Expected Behavior:
Regardless of the optimization level, the value of
computedResultUll
should be consistently and accurately computed as anunsigned long long int
.Observed Behavior:
When compiled with Clang-18 under
-O1
and-O2
optimization levels, the computed value forcomputedResultUll
shows inconsistency:Analysis:
The inconsistency is identified under the following conditions:
unsigned long long int
.These conditions are specific and intricate, but the inconsistency is notable and is not attributed to Undefined Behavior.
Steps to Reproduce:
O1
andO2
optimization levels.computedResultUll
.Conclusion:
The observed inconsistency in extending values to
unsigned long long int
, under specific conditions involving complex loop structures and type casting operations, when compiled using Clang-18 at-O1
and-O2
optimization levels, warrants further investigation and resolution.The text was updated successfully, but these errors were encountered: