Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Path finding broken on Raspberry Pi #2088

Closed
2 of 7 tasks
MarcelHB opened this issue May 18, 2024 · 28 comments · Fixed by #2090
Closed
2 of 7 tasks

Path finding broken on Raspberry Pi #2088

MarcelHB opened this issue May 18, 2024 · 28 comments · Fixed by #2090

Comments

@MarcelHB
Copy link
Collaborator

Bug description

I have found some mysterious bug, and only when running GemRB on a RPi.

Essentially almost all movement attempts fail on all the (IWD2) maps I've been trying, just leaving this log line as a clue:

Log(DEBUG, "WalkTo", "{} re-pathing ignoring actors", fmt::WideToChar{actor->GetShortName()});

Yet the 2nd attempt won't work, too.

It's bad at master and 43647ce already and I'll try a bisection but that takes a little time since there were also compiler errors around.

GemRB version (check as many as you know apply)

  • master as of this issue
  • 0.9.2
  • 0.9.1
  • 0.9.0

Video Driver (check as many as you know apply)

  • SDL1.2
  • SDL2 built with OPENGL_BACKEND enabled
  • SDL2 without OPENGL_BACKEND enabled
@MarcelHB MarcelHB changed the title Path finding broken on Rasberry Pi Path finding broken on Raspberry Pi May 18, 2024
@MarcelHB MarcelHB added this to the 0.9.3 - TBN milestone May 18, 2024
@MarcelHB
Copy link
Collaborator Author

Appears that 43647ce has introduced the problem.

@MarcelHB
Copy link
Collaborator Author

Simple fix proposal that works, since everything else looked tedious to check also w. r. t. compiler settings and compilation times on RPi:

  1. Make NormalizeDeltas a template function.
  2. Replace every occurrence of float_t in PathFinder.* by double.

@lynxlynxlynx
Copy link
Member

Why 1 though? Everyone passes in the same type ...

@MarcelHB
Copy link
Collaborator Author

I'm seeing float_t-users in Map.cpp and Scriptable.cpp.

@bradallred
Copy link
Member

I'm having difficulty conceiving NormalizeDeltas producing a meaningful difference between float and double. If true tho, you should be able to repro on another system by changing it to float and deducing why.

@MarcelHB
Copy link
Collaborator Author

When I discovered that, I wrote a test on Linux on both x86_64 and RPi-ARM that report sizeof(float_t) and FLT_EVAL_METHOD with 4, 0, with -O3 -ffast-math GCC v12/13. I assume that both platforms go float here but I can't reproduce this outside of the RPi.

Can try to write a test case later to look for differences between float and double on both platforms.

@MarcelHB
Copy link
Collaborator Author

What did I do: rotate vector (2, 2) by 1° up to 360 times and then put x, y every time into NormalizeDeltas. One calculation is done over float types, one over double. I found two cases at 15° and 75° where float and double form a different pair between ARM and x86_64.

Compiled over: -O3 -ffast-math -frounding-math

So yes, there are cases where the downgrade from double to float give us a different output on different platforms here.

@bradallred
Copy link
Member

It might help if you elaborated a bit. Have you ruled out -ffast-math and -frounding-math? It would still be nice to know the specifics so we can consider implications beyond NormalizeDeltas.

@MarcelHB
Copy link
Collaborator Author

MarcelHB commented May 18, 2024

  • -ffast-math -frounding-math: ARM and x86_64 show different results in the float calculation
  • -frounding-math: ARM and x86_64 show the same differences, and on each platform there is the same one difference in the float calculation compared to -ffast-math
  • no add. flags, just -O3: suprisingly, still a diff between them
  • -O0: still different output

Let's try being explicit about the rounding mode, I pick FE_DOWNWARD: Whoop, one of the two cases has disappeared, one line of diff is left. 🤷

@MarcelHB
Copy link
Collaborator Author

MarcelHB commented May 18, 2024

std::hypot generates some diff noise when using it, in the less significant parts of course but I better don't try to analyze the numerical impact here now.

@bradallred
Copy link
Member

these type of comparisons seems suspicious dx == 0.0. I believe thats the sort of thing thats not going to be well defined with fast-math and also that 0.0 is a double, not a float.

@MarcelHB
Copy link
Collaborator Author

Looking at godbolt.org assembly output for such a check, that does not seem to make any difference (using -ffast-math or not).

@bradallred
Copy link
Member

it looks like there is a bug here: dy = STEP_RADIUS;. If I'm not mistaken, that should be dy = STEP_RADIUS * 0.75;, no?

@MarcelHB
Copy link
Collaborator Author

I guess so.

@bradallred
Copy link
Member

std::hypot generates some diff noise when using it, in the less significant parts of course but I better don't try to analyze the numerical impact here now.

If the problem is std::hypot, we could use an approximation replacement like alpha beta.

@MarcelHB
Copy link
Collaborator Author

Sounds somewhat counter-intuitive to fix a problem of small errors with even bigger errors?

Also, not sure if NormalizeDeltas is even the root cause here, given that the numbers we put in there shouldn't be that large to suddenly result in the blue and make the algorithm fail that largely. It's just an easy method to quickly validate whether there are platform discrepancies in the type downgrade between the two archs.

@bradallred
Copy link
Member

Sounds somewhat counter-intuitive to fix a problem of small errors with even bigger errors?

I don't know why you're insisting the problem is one of precision. just because the problem "goes away" when you use a double means nothing. The math shouldn't require that, all we are doing is scaling a velocity along a vector. A better performing implementation for something on a hot path is always welcome too, provided it is accurate enough and I don't see why it wouldn't be.

@lynxlynxlynx
Copy link
Member

The problem goes away also when using floats, so I wouldn't discount precision so easily. I can't make sense of that at FLT_EVAL_METHOD being 0 though.

@MarcelHB
Copy link
Collaborator Author

MarcelHB commented May 19, 2024

I don't know why you're insisting the problem is one of precision. just because the problem "goes away" when you use a double means nothing.

In 2nd paragraph I meant to say that if we have precision problem here on a such smale scale (affecting our calculations in maybe 2/360 cases) and this ends up making 95% of movements invalid already, less precision doesn't look like the fix. But also, the whole image does not fit since the errors still look too small for that impact, and hypot and NormalizeDelta was just an unfortunate test to see that some math issues aren't portable.

My gut feeling was somewhat right, this is probably not the place to focus on, at least for this problem. I put traces into the code and tested a case with the same journey on one PC on a map with no moving actors. So full determinism, if I'm not wrong.

The result shows some interesting problem. This my desktop:

[FindPath/DEBUG]: s = (1019, 808), d = (684, 684), caller = Nummer Zwei, dist = 0, size = 2
[FindPath/DEBUG]: nmptCurrent: 1019, 808
[FindPath/DEBUG]: smptCurrent: 63, 67
[FindPath/DEBUG]: Idx: 5423
[FindPath/DEBUG]: nmptChild: 1035, 808 (0, 1)
[FindPath/DEBUG]: smptChildIdx: 67 * 80 + 64 = 5424
[FindPath/DEBUG]: layz T*: 1019, 808, 1035, 808
[FindPath/DEBUG]: nmptChild: 1019, 820 (1, 0)
[FindPath/DEBUG]: smptChildIdx: 68 * 80 + 63 = 5503
[FindPath/DEBUG]: layz T*: 1019, 808, 1019, 820
[FindPath/DEBUG]: nmptChild: 1003, 808 (2, -1)
[FindPath/DEBUG]: smptChildIdx: 67 * 80 + 62 = 5422
[FindPath/DEBUG]: Next: child blocked
[FindPath/DEBUG]: nmptChild: 1019, 796 (3, 0)
[FindPath/DEBUG]: smptChildIdx: 66 * 80 + 63 = 5343
[FindPath/DEBUG]: layz T*: 1019, 808, 1019, 796
[FindPath/DEBUG]: nmptCurrent: 1019, 796
[FindPath/DEBUG]: smptCurrent: 63, 66
[FindPath/DEBUG]: Idx: 5343

And this is the Raspberry:

[FindPath/DEBUG]: s = (1019, 808), d = (684, 684), caller = Nummer Zwei, dist = 0, size = 2
[FindPath/DEBUG]: nmptCurrent: 1019, 808
[FindPath/DEBUG]: smptCurrent: 63, 67
[FindPath/DEBUG]: Idx: 5423
[FindPath/DEBUG]: nmptChild: 1019, 820 (0, 0)
[FindPath/DEBUG]: smptChildIdx: 68 * 80 + 63 = 5503
[FindPath/DEBUG]: layz T*: 1019, 808, 1019, 820
[FindPath/DEBUG]: nmptChild: 1019, 820 (1, 0)
[FindPath/DEBUG]: smptChildIdx: 68 * 80 + 63 = 5503
[FindPath/DEBUG]: layz T*: 1019, 808, 1019, 820
[FindPath/DEBUG]: nmptChild: 1019, 820 (2, 0)
[FindPath/DEBUG]: smptChildIdx: 68 * 80 + 63 = 5503
[FindPath/DEBUG]: layz T*: 1019, 808, 1019, 820
[FindPath/DEBUG]: nmptChild: 1019, 820 (3, 0)
[FindPath/DEBUG]: smptChildIdx: 68 * 80 + 63 = 5503
[FindPath/DEBUG]: layz T*: 1019, 808, 1019, 820
[FindPath/DEBUG]: nmptCurrent: 1019, 820
[FindPath/DEBUG]: smptCurrent: 63, 68
[FindPath/DEBUG]: Idx: 5503

We can focus on [FindPath/DEBUG]: nmptChild: 1019, 820 (0, 0), that's the result of

NavmapPoint nmptChild(nmptCurrent.x + 16 * dxAdjacent[i], nmptCurrent.y + 12 * dyAdjacent[i]);
Log(DEBUG, "FindPath", "nmptChild: {}, {} ({}, {})", nmptChild.x, nmptChild.y, i, int8_t{dxAdjacent[i]});

What we see is that on the RPi none of the accesses to dxAdjacent[i] is correct, its [0, 0, 0, 0] where it should have been [0, 1, 0, -1] ... if we use float_t instead of double in this file.

With RelWithDebInfo and breaking into this line, the structure of dxAdjacent looks correct though. So far now,

  • no idea why values are wrong, maybe some bogus assembly code? Snippet for ref below
  • no idea how double <-> float_t has its hands in there,
  • couldn't reproduce in a minimal test case yet
ARMv8 assemler code in PathFinder.cpp:374cmp x20, #0x0 add x0, x20, #0x3f ldr x2, [sp, #200] csel x0, x0, x20, lt // lt = tstop negs x1, x20 and x20, x20, #0x3f asr x0, x0, #6 and x1, x1, #0x3f csneg x1, x20, x1, mi // mi = first add x0, x2, x0, lsl #3 tbz x1, #63, 0x7ff7e48fe4 add x1, x1, #0x40 sub x0, x0, #0x8 ldr x2, [x0] mov x3, #0x1 // #1 adrp x4, 0x7ff7f0b000 add x4, x4, #0x138 lsl x1, x3, x1 orr x1, x2, x1 str x1, [x0] add x0, x23, #0x10 stp x4, x0, [sp, #160] add x24, sp, #0x1c8 ldr x4, [sp, #264] add x19, x4, #0x138 add x21, x4, #0x140 ldrsb w3, [x21] mov x0, x24 ldrsb w4, [x19] // <---- PC at L374 ldr w1, [sp, #440] ldr w2, [sp, #444] add w3, w3, w3, lsl #1 add w1, w1, w4, lsl #4 add w2, w2, w3, lsl #2 bl 0x7ff7cccf50 <_ZN5GemRB5PointC1Eii@plt> mov x0, x24 bl 0x7ff7ccc570 <_ZN5GemRB3Map18ConvertCoordToTileERKNS_5PointE@plt>

@lynxlynxlynx
Copy link
Member

Wow. Have you tried making them a static const (since constexpr working on classes is apparently a c++20 feature) or compiling with clang?

@MarcelHB
Copy link
Collaborator Author

  • static did not work
  • plain old C arrays did not work
  • replacing 12/16 by prime numbers did not work (this looks like a valid and desperate attempt when looking at the ARM math code that is being generated here)
  • replacing the index access by a function with some case statements works

Will try clang later.

@MarcelHB
Copy link
Collaborator Author

MarcelHB commented May 19, 2024

Eh ... godbolt put me in some good direction. First I was thinking that GCC was outsmarting me with taking all these 0, 1, -1 into its emitted assembly code or somthing since there was something disappearing at higher levels of O (see below). But no, this is a minimal case roughly looking looking like our problem (this is -O2):

gcc_rpi1

Details of the assembly are irrelevant, I guess, and also constexpr or static don't make a difference. What does make a difference is dropping static/constexpr altogether, and suddenly something re-appears at the bottom:

gcc_rpi2

Although there are no apparent references to dxAdjacent in the code (probably relative to that .LANCHOR0), that made me puzzled a little. Turns out then it works for this bug ...

Looking at the result of -S in both cases locally (to rule out godbolt isn't just cutting things away by some unknown fashion), the array data is there but the code is somewhat different – slightly, visible in the screenshots, too.

@bradallred
Copy link
Member

bradallred commented May 19, 2024

The problem goes away also when using floats, so I wouldn't discount precision so easily. I can't make sense of that at FLT_EVAL_METHOD being 0 though.

you're misunderstanding me. What I'm saying is what we want to do here (normalize a velocity along a vector) does not require double precision. We should change our algorithm if it requires that. we would probably get better performance using another (approximate) method too.

We also shouldn't discount the sort of assumptions fast-math makes. Remember that these std math functions are templates or compiler builtins, so their internals are also subject to fast-math.

@MarcelHB
Copy link
Collaborator Author

Clang v14 is good. So what to do? My best guess is that there is some kind of problem that is isolated to GCC (v12) on (RPi-)ARM.

  • Live with it and recommend Clang if somebody complains
  • Investigate deeper and find a fix, given that it is not a GCC bug
  • Don't investigate deeper and write some macros to make it work somehow

Indications of some kind of bug in GCC are strong but breaking this down into a small test case only gives clues but no reproducibility yet.

@lynxlynxlynx
Copy link
Member

Can you try with 13 or 14? Ideally it'd be a versioned problem and we could simply handle it in cmake.

@MarcelHB
Copy link
Collaborator Author

Hm, don't know. Raspbian Bookworm only provides v12 for now.

@MarcelHB
Copy link
Collaborator Author

Docker to the rescue again, although this isn't much fun on such a device. No improvement with GCC v13.

@lynxlynxlynx
Copy link
Member

Ok, thanks. I think the sanest thing for now is to block gcc RPI builds and if anyone external gets annoyed enough to investigate further successfully, we can lift the ban. Should be trivial to do in CONFIGURE_RPI_SPECIFICS, since by the time it's ran, the compiler has already been detected. Let me know if you want me to handle it.

MarcelHB added a commit to MarcelHB/gemrb that referenced this issue May 21, 2024
MarcelHB added a commit to MarcelHB/gemrb that referenced this issue May 21, 2024
MarcelHB added a commit to MarcelHB/gemrb that referenced this issue May 21, 2024
@lynxlynxlynx lynxlynxlynx linked a pull request May 21, 2024 that will close this issue
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants