Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undefined behavior due to shifts on signed integers #84

Open
Flamefire opened this issue Jul 30, 2019 · 16 comments
Open

Undefined behavior due to shifts on signed integers #84

Flamefire opened this issue Jul 30, 2019 · 16 comments

Comments

@Flamefire
Copy link
Contributor

As found in #65 (comment) using bit shifts on signed integers is undefined behavior. Reason for that is the difference between a "regular shift" (shr) and "arithmetic shift" (sar) where the former just shifts bits while the latter preserves the sign bit in all cases. Also there is a rounding difference for positive and negative numbers.

Examples:

-1 = b11111111
-1 shr 1 = b01111111 = 127
-1 sar 1 = b11111111 = -1

-10 sar 2 = -3
10 sar 2 = 2

This may not be a problem in practice as usually int >> value is translated to sar but I found 1 compiler where it is translated to shr: https://godbolt.org/z/sj_uO6

Not sure if/how to solve this as i >> value is meant as floor(i/2^value) while (int)(i/2^value) is trunc(i/2^value) and hence cannot be "optimized" to a shift.

@erikd
Copy link
Member

erikd commented Aug 5, 2019

@Flamefire
Copy link
Contributor Author

The left shift will run into UB if the range of an int is reached. Interestingly this UB makes it produce the expected SAL O.o

The right shift is... interesting. It relies on 2s-complement or at least that the highest bit is 1 for negative numbers. I haven't found a clear definition what happens for negation of negative numbers as it isn't defined that 2s-complement is used. In C++20 left and right shifts are fully defined though: https://en.cppreference.com/w/cpp/language/operator_arithmetic

I don't have a solution. Just suggestion: In the linked functions make the shift value unsigned. A negative shift is UB. And maybe add an assertion on x in arith_shift_left that INT_MIN>>shift <=x <= INT_MAX >> shift (or so)

@erikd
Copy link
Member

erikd commented Aug 5, 2019

Why would you add an assertion? Aren't all the shifts constant values?

Casting from signed to unsigned, performing the shift and casting back should keep the C compiler happy. Doing this in an inline function (if the shifts are constant, they can be embedded in the actual function) seems like the solution.

@Flamefire
Copy link
Contributor Author

Why would you add an assertion? Aren't all the shifts constant values?

The assertion should be on the x not the shift. E.g. x = (INT_MAX >> 15) passed to shiftLeft(x, 16) is UB (until C++20). So the assert should catch such overflows which I assume are unintended anyway.

@erikd
Copy link
Member

erikd commented Aug 6, 2019

I tired using the undefined behavior sanitizer on x86_64 and found no issues related to shifts at all. Eg

CFLAGS="-fsanitize=undefined -g" CXXFLAGS=${CFLAGS} ./configure
make clean all check

@Flamefire
Copy link
Contributor Author

UBSAN not erroring out is not a guarantee that there is no UB.

The code uses e.g. lrint(scaled_value) >> 16 where scaled value can be negative and that is UB as the standard states:

For negative a, the behavior of a << b is undefined [...] For negative a, the value of a >> b is implementation-defined

Above I linked 1 compiler (I admit a rather special one) that does not use SAR but SHR which will lead to wrong results.

@erikd
Copy link
Member

erikd commented Aug 6, 2019

UBSAN not erroring out is not a guarantee that there is no UB.

In this case it actually is, but whatever.

The potential for undefined behavior at that that has still not been demonstrated to be the cause of the problem seen on PowerPC. The obvious way to determine whether this undefined behavior is the cause the PowerPC issue is to replace >> 16 with / 65536.

@Flamefire
Copy link
Contributor Author

I double checked and you are right: No shift is UB as the only problematic shift is in int_to_fp which is called with filter->coeff_half_len. As this can never be negative I'd define it as unsigned instead of int to reflect it.

However it does run into implementation defined behavior due to the right shift of a signed long in src_float_to_short_array, the uses of fp_to_int should not run into that.

The cause for the PowerPC problem is not the shift but a buggy lrintf which does not handle large values correctly. This issue is unrelated but was found during investigation of possible causes for the other bug.

@erikd
Copy link
Member

erikd commented Aug 6, 2019

I 100% believe that PowerPC (or more likley the compiler) has a problem but you have yet to provide any real evidence that UB is happening here. You need to provide that evidence.

@Flamefire
Copy link
Contributor Author

As written there is no UB, but only implementation defined behaviour which is in src_float_to_short_array:

  • a negative float is passed to src_float_to_short_array (e.g. in the tests)
  • that float is scaled by a positive factor so it stays negative
  • that is converted to a signed long via lrint
  • this negative signed long is right-shifted by 16 -> implementation defined result

Solution: Use code like this:

void
src_float_to_short_array (const float *in, short *out, int len)
{	double scaled_value ;
	while (len)
	{	len -- ;
		scaled_value = in [len] * 32768 ;
		if (scaled_value >= 32767.f)
		    out [len] = 32767 ;
		else if (scaled_value <= -32768.f)
		    out [len] = -32768 ;
                else
		    out [len] = (short) (lrint (scaled_value)) ;
	} ;
}

@erikd
Copy link
Member

erikd commented Aug 7, 2019

I will ask yet again, where is the evidence that this is UB?

Did you even try my suggestion of:

The obvious way to determine whether this undefined behavior is the cause the PowerPC issue is to replace >> 16 with / 65536.

@Flamefire
Copy link
Contributor Author

I will ask yet again, where is the evidence that this is UB?

I don't understand the question. With the steps described above a right shift of a negative signed integer is performed. Do you agree with that?
The C++ standard (which I quoted above) and the C standard define the result of a right shift of a negative signed integer as "implementation defined".
Hence it is shown that implementation defined behavior is invoked.

What else do you need?

Did you even try my suggestion of:

No, why? I don't even have a buggy PowerPC platform and the shift is unrelated to the bug reported in #65.

The current implementation works because a right shift of a signed int is "usually" defined (by the compiler in use!) to be an arithmetic shift, but a compiler on another platform may not do that (which is the meaning of "implementation defined"), see also https://stackoverflow.com/a/7636/1930508 or Wikipedia:

The (1999) ISO standard for the programming language C defines the right shift operator in terms of divisions by powers of 2.[8] Because of the above-stated non-equivalence, the standard explicitly excludes from that definition the right shifts of signed numbers that have negative values. It does not specify the behaviour of the right shift operator in such circumstances, but instead requires each individual C compiler to define the behaviour of shifting negative values right

@erikd
Copy link
Member

erikd commented Aug 7, 2019

I am very hesitant to change code without evidence that the change actually fixes something.

@Flamefire
Copy link
Contributor Author

Sure: My proposed code does not rely on implementation defined behavior (fixes this issue) and additionally does not run into the bug @janstary has with his machines lrint implementation due to keeping the numbers low. This is proven by him at #65 (comment) :

That makes the float-to-short test pass for me

@janstary
Copy link
Contributor

janstary commented Aug 9, 2019

UBSAN not erroring out is not a guarantee that there is no UB.

In this case it actually is, but whatever.

The potential for undefined behavior at that that has still not been demonstrated to be the cause of the problem seen on PowerPC. The obvious way to determine whether this undefined behavior is the cause the PowerPC issue is to replace >> 16 with / 65536.

With that change (to the current git), the test fails with

tests/float_short_test

        float_to_short_test ............................. 

        Line 64 : out [1] == 0

@Flamefire
Copy link
Contributor Author

That is expected:

When integers are divided, the result of the / operator is the algebraic quotient with any fractional part discarded. [...] This is often called ‘‘truncation toward zero’’.

So: -1 >> n == -1 for all n but -1 / n == 0 for n>1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants