-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows large pages commit - modern version instantly crashing #2677
Comments
Updated: I checked the latest build on another, more modern computer (BMI2 - Haswell CPU - Stockfish version). The problem is exactly the same. At the moment, I did not add the newest build to any GUI, as it is obvious that the problem after Lock pages commit exists. |
What do you mean exactly with "But when I try to close the command prompt window, the engine stops working"? |
@CoffeeOne I mean that if I try to close the running engine window as normal way - engine crashing with Windows application error report is following. It is not good. Such was never happened before the latest commit. |
I tried to duplicate, with and without Large Pages from abrok , to no avail. Can you post system information ( search "system information: in search bar and run app.)
|
One thing I would try, is to run wndows updates until it does not update anymore. Standard procedure when an app that always worked before stops working. |
Not sure if this is related but I compiled the latest version (large pages), on Windows, (with large pages off) and while I'm not getting any crashes or errors, the executable takes noticeably longer to close (around 2 to 3 seconds) while all prior executables close pretty much instantaneously when the "X" button is clicked. This is repeatable too, and I'm not doing anything differently than I normally would. I checked and this behavior isn't present/hasn't been present in compiles that go only up to the TBTables::Entry commit, so I'm fairly certain it has something to do with the latest Large Pages commit. I also find it strange that this effect is even occurring given that I'm not even using the functionality |
@silversolver1 Can you please check Event log - Application. I assume you will find out Stockfish crashes after those 2-3 seconds of freezing. This is exactly behavior that I got on the second machine. On the first one there were Windows crashes messages after seconds of freezing, but on the second machine only freezing was visible, but I guessed to look into the event log and found those same crashes. |
@MichaelB7 The reason is definitely not in the hardware, as this problem occurs on two different computers at once. And I did not update Windows before the crash issue occurred. It appeared exclusively in the latest Stockfish build with Large pages commit. Right now I am running any previous Stockfish builds on both PCs and none of them have a similar crashes problem. |
It could be the other commit regarding table bases - I will post a link to my complied version of Stockfish that does not have the tbtables commit since that commit was problematic for me and it wasn’t an elo gainer so I just ignored it.
edit: here's the file, functionally it's the same stockfish as SF, but it does have other bells and whistles. Let me know if this has the same behavior or not, This has LP, slightly modified, but it does not have commit 86ee4eb wich was also committed today
[Stockfish-dev-051320.zip](https://github.com/official-stockfish/Stockfish/files/4625731/Stockfish-dev-051320.zip)
… On May 13, 2020, at 9:24 PM, Ivan Panitevsky ***@***.***> wrote:
@MichaelB7 The reason is definitely not in the hardware, as this problem occurs on two different computers at once. And I did not update Windows before the crash issue occurred. It appeared exclusively in the latest Stockfish build with Large pages commit.
Right now I am running any previous Stockfish builds on both PCs and none of them have a similar crashes problem.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
@Coolchessguykevin Yes, I checked and you were right it shows up with event name APPCRASH. Some of the seemingly relevant details in the crash log are: P4: ntdll.dll https://stackoverflow.com/questions/9497384/how-to-interpret-windows-appcrash-mysterious-log From there, According to google, exception code c0000005 indicates an access violation The only real thing I could notice about this is that in tt.cpp's TranspositionTable::resize, Here, variable mem is not declared, defined or initialized in any way that I can tell. It does not seem to have a type, is not passed in as a parameter in any way, is not a global variable and does not point to anything either (so I am not sure how it can be used. although i could of course be missing something ;) ) |
@skiminki could you have a look ? From the description is seems that the TT.resize(0) at the end might be causing troubles. |
@silversolver1 mem is a variable of the TranspositionTable class (see tt.h line 95). This is certainly not related to the TBTables::Entry commit |
@silversolver1 could you try the following, it is a bit a shot in the dark, but it would be useful to exclude it as a cause:
|
@vondele Sure, I compiled a version with just the line |
OK, thanks. So that's not an issue. The windows binary (including the abrok one) runs fine under wine on Linux. @silversolver1 could you reproduce this crash in a debugger and get the location of the crash confirmed? Also as asked previously, I guess this will be Windows version specific, so can you provide that as well. |
It happens only on SIGINT sent to the process (so Ctrl-C or just closing terminal window with SF running). Command 'quit' works with no problem. |
I think I have reproduced this. I get this behavior + entry in the application log if I close the stockfish terminal window with the "close window" button. However, I do not see this issue if I use command 'quit'. My reproduction steps precisely:
However, the following works without issues
Yeah, I think this has to do with SIGINT and possibly about that we didn't get a chance to close VirtualAlloc's. I'll post the events in a follow-up comment. |
This is about NTDLL.DLL crashing because we didn't close VirtualAlloc's in ctrl+c exit path. I get the same behavior with 'quit' exit with the following patch:
With this patch, the following gives a crash:
Actually, I don't think we need the crash events here. I'll investigate a bit further. |
Ok, the crash is most likely because of a left-over. That is, TranspositionTable destructor frees mem if it's non-null. The fix is to delete it, because we free transposition table now manually--or rather, maybe we should delete that resize(0) stuff and replace free(mem) with aligned_ttmem_free(mem) call. This was a bug in my patch. I'll send a PR soon.
So with this patch, I don't see crashes. TBH, I'm a bit surprised that Windows calls the static destructors on window close after aborting the program. |
Candidate PR to fix this: #2679 |
I haven't yet tried anything with a debugger but as I commented in 2679, the PR seems to work to remedy the problem and I no longer get any application crashes or errors in the Windows Event Viewer log (running Windows 10). The hang on closing using the Ctrl-C and "X" methods also seems to be gone now, so it looks like this worked |
yes, will be fixed after merging. |
I just downloaded the latest Stockfish build ("Add support for Windows large pages" commit) from the abrok.eu site.
I run the engine exe-file in the command prompt (just open exe-file without adding it to any GUI) and see the following:
Of course I did not change the system policy to use page locking.
Bench command is working well.
But when I try to close the command prompt window, the engine stops working (with Windows appropriate message and log information in the event viewer). This happens only when the latest modern build is launched (my processor is quite old and is not supports BMI command), all the previous builds until Lock pages commit work without any problems.
System: Win 10 Pro x64 (November 2019 update).
The text was updated successfully, but these errors were encountered: