-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Fix round_coordinate behavior for arm64 (fixes Namco System 22 video issues on arm64) #14369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was old code written before C99 etc. expanded the standard maths library. But your change is inefficient (and also messes up indentation). The code is also dubious in that it rounds the midpoint towards negative infinity (round midpoint to even is standard IEEE754 behaviour, and round midpoint away from zero is the common approach taught at primary school). It also doesn’t attempt to deal with negative overflow in its current form.
Wouldn’t something like this work preserving existing behaviour for the midpoint, and avoid unnecessary conversions?
inline int32_t round_coordinate(BaseType value)
{
const BaseType ipart = std::floor(value);
if (ipart <= BaseType(std::numeric_limits<int32_t>::min())
return std::numeric_limits<int32_t>::min();
else if (ipart >= BaseType(std::numeric_limits<int32_t>::max())
return std::numeric_limits<int32_t>::max();
const BaseType fpart = vaue - ipart;
return int32_t(ipart) + ((fpart > 0.5) ? 1 : 0);
}
If we were to change it to round midpoint to even, it would probably work as:
inline int32_t round_coordinate(BaseType value)
{
if (value <= BaseType(std::numeric_limits<int32_t>::min())
return std::numeric_limits<int32_t>::min();
else if (value >= BaseType(std::numeric_limits<int32_t>::max())
return std::numeric_limits<int32_t>::max();
else
return int32_t(std::lround(value));
}
And yes, it’s ironic that there’s a comment that says “round in a cross-platform consistent manner” on a function that very much depends on implementation-defined behaviour.
I’ll look at it on my ARM test system later.
Hey, thank you so much for such a detailed and constructive review! You're absolutely right; my patch was pretty "brute force" just to fix the symptom on arm64. I'm not a C++ expert, and the subtleties of undefined behavior (UB) between different architectures are definitely tricky. I had a feeling the problem was in this area because of the Namco System 22 video glitches, but I can tell you, finding the exact "nut to tighten" wasn't easy! 😅 I really like your second suggestion (the one using std::lround with manual clamping). It's much cleaner and more modern. The reason my patch looked a bit 'clunky' and inefficient was that I was intentionally trying to mimic the style of the old, existing code. I was worried that dropping in a modern C++ function would 'clash' too much with the rest of the file. You're completely right that the whole area could probably use a good rewrite, but my intention was just to surgically fix the immediate problem (just tighten that one loose nut on the arm64 side!) without refactoring everything around it. In any case, I'm not a C++ super-expert either. I'm more of a "jack of all trades, master of none," haha. So I really appreciate you pointing out the cleaner, standard-compliant way to do it. Your modern approach is definitely the right way forward, though. Thanks! |
I’m sure there’s more unportable code lurking. The change to drcbec.cpp here also fixed something that was broken on clang/AArch64 (although in that case, no games were actually hitting the broken code, I caught it with unit tests): cee362f#diff-75d9be28f5eca6a565d011ad6c7b721e5e5c2c3c9a5346f92c0f8469074d81c4R1469 (In general, semantics are undefined for an explicit or implicit conversion that changes integer size and signedness at the same time.) |
The change isn’t safe – it’s possible that |
if (value >= BaseType(std::numeric_limits<int32_t>::max())) | ||
return std::numeric_limits<int32_t>::max(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know that if it doesn’t hit this test, int32_t(ipart)
will never give std::numeric_limits<int32_t>::max()
, because std::floor
always produces a lesser or equal value, and converting to integer rounds towards zero.
In cases where BaseType
can't represent std::numeric_limits<int32_t>::max()
precisely, BaseType(std::numeric_limits<int32_t>::max())
gets rounded away from zero so the test should still be safe.
const BaseType ipart = std::floor(value); | ||
if (ipart < BaseType(std::numeric_limits<int32_t>::min())) | ||
return std::numeric_limits<int32_t>::min(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be <
not <=
or it will fail to round when BaseType
can precisely represent a 32-bit integer and ipart == BaseType(std::numeric_limits<int32_t>::min())
and fpart > BaseType(0.5)
.
Thanks a lot for the review, @cuavas! You're absolutely right, that's a very subtle bug with the std::floor and int32_t::min() edge case. I had completely missed that. You really have a keen eye for the details! I appreciate you catching it. |
Fix round_coordinate portability and undefined behavior on arm64
The previous implementation of round_coordinate relied on undefined behavior (UB) when casting an out-of-range floating-point value (BaseType) to int32_t.
While this might appear to "work" on x86 (often by wrapping around), it produces incorrect results on architectures like arm64, which at least fixes weird video issues in Namco System 22 games.
This commit fixes the issue by:
Clamping: Explicitly saturating the input value to INT32_MAX or INT32_MIN before the cast, preventing the UB.
Safe Rounding: Adding a defensive check to prevent overflow if rounding up (from the > 0.5 logic) would push the result past INT32_MAX.
This ensures correct, portable behavior across all architectures.