Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RC7 crashes across the board on BPC. #4504

Closed
MoerasGrizzly opened this issue Jul 20, 2022 · 24 comments · Fixed by #4513
Closed

RC7 crashes across the board on BPC. #4504

MoerasGrizzly opened this issue Jul 20, 2022 · 24 comments · Fixed by #4513
Labels
ai A feature or issue related to the AI algorithms bug An issue from unintended consequences
Projects
Milestone

Comments

@MoerasGrizzly
Copy link

MoerasGrizzly commented Jul 20, 2022

In several War in Heaven missions (that I've found thus far), the RC7 build hard crashes to desktop. These crashes do not occur on the RC6 build.

On Delanda Est (BP2-15) and Sunglare (BP2-16) and the RC7 fast debug build, the following errors are thrown before the hard crash:

Warning: sexp-script-eval failed to evaluate string "proBoxValue()"; check your syntax
File: sexp.cpp
Line: 24031

In Pawns of a Board of Bone (BP2-14), the following error is thrown:

Assert: "aip->target_objnum == OBJ_SHIP || aip->target_objnum == OBJ_WEAPON || aip->target_objnum == OBJ_DEBRIS || aip->target_objnum == OBJ_ASTEROID || aip->target_objnum == OBJ_WAYPOINT"
File: aicode.cpp
Line: 10027
This function just discovered that Gemini 3 has an invalid target object type of 1. This is bad. Please report!

ntdll.dll! ZwWaitForSingleObject + 20 bytes
KERNELBASE.dll! WaitForSingleObjectEx + 142 bytes
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
KERNEL32.DLL! BaseThreadInitThunk + 16 bytes
[...]
[ This info is in the clipboard so you can paste it somewhere now ]


Use Debug to break into Debugger, Exit will close the application.

ntdll.dll! ZwWaitForSingleObject + 20 bytes
KERNELBASE.dll! WaitForSingleObjectEx + 142 bytes
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
KERNEL32.DLL! BaseThreadInitThunk + 16 bytes
ntdll.dll! RtlUserThreadStart + 43 bytes

On "Deals in Shadows" (BP2-13), the following error is thrown:

Assert: "aip->target_objnum == OBJ_SHIP || aip->target_objnum == OBJ_WEAPON || aip->target_objnum == OBJ_DEBRIS || aip->target_objnum == OBJ_ASTEROID || aip->target_objnum == OBJ_WAYPOINT"
File: aicode.cpp
Line: 10027
This function just discovered that Knight 2 has an invalid target object type of 1. This is bad. Please report!

ntdll.dll! ZwWaitForSingleObject + 20 bytes
KERNELBASE.dll! WaitForSingleObjectEx + 142 bytes
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
KERNEL32.DLL! BaseThreadInitThunk + 16 bytes
[...]
[ This info is in the clipboard so you can paste it somewhere now ]


Use Debug to break into Debugger, Exit will close the application.

ntdll.dll! ZwWaitForSingleObject + 20 bytes
KERNELBASE.dll! WaitForSingleObjectEx + 142 bytes
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
fs2_open_22_2_0_RC7_x64_AVX-FASTDBG.exe! <no symbol>
KERNEL32.DLL! BaseThreadInitThunk + 16 bytes
ntdll.dll! RtlUserThreadStart + 43 bytes
@Kiloku
Copy link
Contributor

Kiloku commented Jul 20, 2022

This is related to the check added in #4500

@Kiloku Kiloku added bug An issue from unintended consequences ai A feature or issue related to the AI algorithms labels Jul 20, 2022
@Kiloku Kiloku added this to Needs triage in Bug Triage via automation Jul 20, 2022
@Kiloku Kiloku moved this from Needs triage to Critical Priority in Bug Triage Jul 20, 2022
@Goober5000
Copy link
Contributor

I don't see the sexp-script-eval error in BP 3.0.7 running the latest FSO code.

MageKing17 added a commit to MageKing17/fs2open.github.com that referenced this issue Jul 20, 2022
PR scp-fs2open#4500 added some checks, but accidentally compared `aip->target_objnum` to an object type (although the assertion message correctly retrieved the object type, leading to the bizarre assertion that `OBJ_SHIP`, i.e. 1, was an invalid target object type). This commit makes them all access the `Objects[]` array and retrieve the `.type`, just as the assertion message does (and the pre-scp-fs2open#4500 conditional did).

Related to scp-fs2open#4504, in that it's the source of the failed assertions, although I can't be sure it was the cause of the crashes without testing.
@MoerasGrizzly
Copy link
Author

That's weird. Delenda Est crashes reliably with that error for me when running either RC7 or the 20-7-2022 nightly on BP 3.0.7

@Goober5000
Copy link
Contributor

I tried both RC7 and the 7-20 nightly on BP 3.0.7, with mission bp2-15, and didn't get the sexp-script-eval error. Is your installation missing the prompt box script, perhaps?

@MoerasGrizzly
Copy link
Author

MoerasGrizzly commented Jul 20, 2022

I uploaded the current installation.

That being said, the RC7 build also crashes in the BP 3.0.3 installation.

I should clarify: The crash does not occur on load, but a short time into the mission.

@Goober5000
Copy link
Contributor

I still don't see the sexp-script-eval error. I see the aip->target_objnum errors, but not that one.

Do you have any checkpoints saved, and are you starting the mission from a checkpoint?

Baezon pushed a commit that referenced this issue Jul 21, 2022
PR #4500 added some checks, but accidentally compared `aip->target_objnum` to an object type (although the assertion message correctly retrieved the object type, leading to the bizarre assertion that `OBJ_SHIP`, i.e. 1, was an invalid target object type). This commit makes them all access the `Objects[]` array and retrieve the `.type`, just as the assertion message does (and the pre-#4500 conditional did).

Related to #4504, in that it's the source of the failed assertions, although I can't be sure it was the cause of the crashes without testing.
@z64555 z64555 added this to the Release 22.2 milestone Jul 22, 2022
@z64555
Copy link
Member

z64555 commented Jul 23, 2022

I should clarify: The crash does not occur on load, but a short time into the mission.

@MoerasGrizzly, is this true for all reported crashes on this issue?

@MoerasGrizzly
Copy link
Author

Yes

@z64555
Copy link
Member

z64555 commented Jul 25, 2022

Ok, Please check out RC8 as soon as we get it published. We haven't found the sexp-script-eval error yet, but according to Goober that shouldn't make a crash. We'll demote the priority level to low or close once its confirmed that there are no crashes or freezes related to this issue.

@JohnAFernandez
Copy link
Contributor

JohnAFernandez commented Jul 25, 2022

Got a report on discord that the script crash is still occurring on RC8.

@MoerasGrizzly
Copy link
Author

Script crash is still occuring on RC8 here as well.

@Goober5000
Copy link
Contributor

In that case, can you answer my earlier question?

Do you have any checkpoints saved, and are you starting the mission from a checkpoint?

And if you have any checkpoints from that mission, please upload them.

@MoerasGrizzly
Copy link
Author

MoerasGrizzly commented Jul 26, 2022

No.

The crash also occurs on Sunglare (BP2-16), which lacks checkpoints alltogether. It consinstently shows up when the Masyaf jumps in.

@jg18
Copy link
Member

jg18 commented Jul 26, 2022

This issue has been found with Battle of Neptune using July 25 nightly (scroll up for debug log text; earlier posts may have more info as well).

@Kiloku
Copy link
Contributor

Kiloku commented Jul 26, 2022

I managed to reproduce the issue in the Battle of Neptune (in RC8, instead of the nightly) and I have a coredump that I can send anyone who might want to analyze it.

Looking into the debugger, I really can't tell what is happening, but it seems to be related to the Lua stack and memory issues.

@Goober5000
Copy link
Contributor

Goober5000 commented Jul 26, 2022

I still have not been able to reproduce this. Tried RC8 and latest master, release and debug, no luck.

I hope it isn't caused by #4482. I don't know how it could be, but it fits the time frame.

@jg18
Copy link
Member

jg18 commented Jul 27, 2022

@Goober5000 maybe offer @MoerasGrizzly a custom build of master with PR #4482 reverted and see if repro is still possible?

@Kiloku maybe try the same sort of build with your Battle of Neptune repro?

Were there other recent commits related to the Lua runtime environment?

@Goober5000
Copy link
Contributor

According to @Kiloku, reverting 4482 did fix this issue. Unfortunately since 4482 itself was a fix, more investigation is needed.

@BMagnu , @asarium , anything you can think of?

@EatThePath
Copy link
Contributor

EatThePath commented Jul 27, 2022

copy paste of my discord stream of consiousness rummaging

My hunch is that a lua_checkstack is failing but doing so inside some try-catch mess that's hiding it from the outside world and then the evaluation continues on in an unready state
the one inside LuaFunction::call throws a lua exception if it fails but it may be called from inside somewhere that catches lua exceptions. Like, for instance script_state::EvalStringWithReturn
why this worked before that safety check and doesn't now I can't begin to untangle tonight but I'm pretty sure that branch might have fruit if shaken hard enough```

@asarium
Copy link
Member

asarium commented Jul 27, 2022

Since the discord conversation mentions errors from within vm_realloc I would guess that we somehow exhaust the available memory but lua_checkstack should only allocate something if we do not have enough stack space available already.

Do we know how many arguments we try to push onto the Lua stack?

@EatThePath
Copy link
Contributor

In general, or in this case? Because I think proBoxValue only ever has zero.
This evening I'm going to try cramming a bunch of diagnostic logging into the functions surrounding the stack trace and see if anything weird jumps out at me.

@EatThePath
Copy link
Contributor

My belief now is that the crashes here are a symptom of the problem 4482 was addressing. It added exceptions when out of stack space, lua is running out of stack space, and the game is crashing rather than just continuing into unsafe memory as it appears it would do otherwise.

It looks like EvalStringWithReturn leaks stack space, one index per call. Why this hasn't caused bigger problems before I have no idea. I've a PR in the works, hopefully it's a suitable solution.

@MoerasGrizzly
Copy link
Author

MoerasGrizzly commented Jul 28, 2022

Nicely done folks. Looking forward to the next RC. :-)

@MoerasGrizzly
Copy link
Author

Having done quite a bit of playing with the most recent nightlies on BPC , I've detected no more crashes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ai A feature or issue related to the AI algorithms bug An issue from unintended consequences
Projects
Development

Successfully merging a pull request may close this issue.

8 participants