Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise stack size to 8MB for pthreads #2303

Closed
wants to merge 1 commit into from

Conversation

noobpwnftw
Copy link
Contributor

@noobpwnftw noobpwnftw commented Sep 14, 2019

It seems there is no other way to specify stack size on std::thread than linker flags and the effective flags are named differently in many toolchains. On toolchains where pthread is always available, this patch put the stack size change in the C++ code to ensure minimum stack size, instead of relying on linker defaults which may be platform-specific.

Also raises default stack size on OSX to current Linux default (8MB) just to be safe.

No functional change

… size instead of relying on linker defaults which may be platform-specific.

Also raises default stack size on OSX to current Linux default(8MB) just to be safe.

Note: it seems there is no other way to specify stack size on std::thread than linker flags and the effective flags are named differently in many toolchains.

No functional change
@snicolet
Copy link
Member

snicolet commented Sep 14, 2019

For Mac OS X default stack size for main and auxiliary threads, a good documentation is the platform_thread_posix.cc file they use at Gggogle to build Chromium: https://chromium.googlesource.com/chromium/chromium/+/master/base/threading/platform_thread_posix.cc

I have tested the result of ulimit -s on my two computers, running Mac OS X 10.9.5 and 10.13.6, in both cases the result was 8Mb.

@vondele
Copy link
Member

vondele commented Sep 14, 2019

@snicolet the created threads have a different stack size than the limit one sees with ulimit -s. In particular on macOS (512Kb). That's why we have the code. I think @noobpwnftw patch is a good idea.

@snicolet
Copy link
Member

I think too it is a good idea!

@MichaelB7
Copy link
Contributor

@vondele you are correct , I had tried ulimit -s and although it did return 8mb, it did not stop the crashes until you inserted the specific code for macOS.

@vondele
Copy link
Member

vondele commented Sep 14, 2019

I would have one question on this patch, is there a limit on the stack size for 32bit MINGW ? 2Mb might be a limit on some 32 bit systems. See also https://sourceforge.net/p/mingw-w64/bugs/369/#3fc7
In particular, if check my mingw build I see

$ x86_64-w64-mingw32-objdump -x ./stockfish.exe  | grep SizeOfStackReserve
SizeOfStackReserve	0000000000200000

So, I guess we have to check this actually increases stack size as expected.

Note that 2Mb would still be enough to run safely.

@vondele
Copy link
Member

vondele commented Sep 14, 2019

OK, I've checked the patch, and it is fine as is, at least on my system it does reserve 8Mb stack. From my point of view, it would be ready to merge.

@noobpwnftw
Copy link
Contributor Author

noobpwnftw commented Sep 14, 2019

Here are the reasons that I didn't take the Makefile approach:

  • Linux binaries(ELF) and Mac binaries(Mach-O) do not seem have a field to specify thread stack size, instead it is mostly up to their corresponding std::thread implementations, some use their own defaults(like OSX) and some use OS defaults(generic Linux). This patch increases the stack size by explicitly specify a safe value of 8MB.

  • Windows binaries(COFF) have a field to specify default stack size, the original code was using std::thread on MINGW and other compilers like MSVC, which must then rely on linker flags to change defaults. This has a problem since for MINGW, it is possible to compile into a MSYS binary(which would be in ELF format) or a native binary(which would be in COFF format) and even better there exists a method to link with a MSVC linker instead of ld, this would result in 3 different ways of adding linker flags. Provided that MSYS's implementation of std::thread is actually pthread, this patch treats it as if it is on OSX and reuse the logic, saving the trouble of having a complex Makefile.

Personally I use MSVC to compile Stockfish on Windows, it builds out-of-box by importing the source folder into an empty solution, while missing all custom flags(bmi2, sse, etc) from the Makefile and using native std::thread implementation(MSVC doesn't have pthread anyway). This is more than fine compared what you'd normally get after skipping Makefile. Such users should be fully responsible for taking care of missing Makefile flags and other compiler-specific flags(the default 1MB stack size may have problems but fixing that would either need a MSVC project file or introduce platform-dependent code).

@vondele
Copy link
Member

vondele commented Sep 14, 2019

The 1MB of MSCV would be really close to the limit I think. For a search leading to a PV line of depth 245, I'm estimating a stack usage of ~.97MB:

search.cpp:322:6:virtual void Thread::search()	13792	dynamic,bounded
search.cpp:565:9:Value {anonymous}::search(Position&, Search::Stack*, Value, Value, Depth, bool) [with {anonymous}::NodeType NT = (<unnamed>::NodeType)1]	4128	dynamic,bounded
(245*4128+13792)/1024/1024
.97766113281250000000

@noobpwnftw
Copy link
Contributor Author

Yes, in cases of a TB probe, it may go even higher due to some recursion in probing code, that I'm not sure if it is indeed stack intensive, but along with some alignments(the TCEC binary has 4184 on search<>) and overhead from Thread::search(13KB) and intermediate functions, 1MB is problematic.

@snicolet snicolet changed the title On toolchains where pthread is always available, ensure minimum stack… Raise stack size to 8MB for pthreads Sep 14, 2019
@snicolet
Copy link
Member

snicolet commented Sep 14, 2019

I propose to merge the pull request in current form, and we can deal with the MSCV case in a later commit, if necessary (or describe a MSCV project setting precisely in the wiki, if that is the only solution).

@noobpwnftw
Copy link
Contributor Author

noobpwnftw commented Sep 15, 2019

Yes, this PR is fine as-is.

Using MSVC I produced a x64 Release binary and the stack usage is as follows:
image

which makes me think even 2MB is probably not safe, after this PR it'll reserve 8MB on MINGW complies but we might want to document this requirement for all other forms of compiling.

@snicolet
Copy link
Member

By the way, before I merge the pull request, I would like a confirmation that 8MB is not too much for each thread.

a) with 512 threads, that makes 4GB of memory for Stockfish, I suppose that it is peanuts for the monsters with 512 cores?
b) for the average Stockfish fan with small machines, or Stockfish on tablets and phones, isn't the increase a problem?
c) @DragonMist maybe correspondence chess experts launch a dozen of instances of Stockfish at the same time on their machines?

@noobpwnftw
Copy link
Contributor Author

8MB per thread is the default size on almost all 64-bit Linux, while this patch only affects OSX and MINGW32/64 so I think it is pretty harmless.

@snicolet
Copy link
Member

snicolet commented Sep 15, 2019

I beware of MINGW32/64 users these days...

Any opinion on points (b) and (c) above?

@noobpwnftw
Copy link
Contributor Author

noobpwnftw commented Sep 15, 2019

Well, for (b), our fearless MINGW users are on Windows, they probably wouldn't care for a typical 8-core machine having 64MB of stack size. For tablets and phones, they don't use MINGW nor OSX, so they are not affected.

As for (c), time to buy bigger machines. :)

@Alayan-stk-2
Copy link

I don't think c) is a significant concern. Even if you launch multiple stockfish instances, you don't want more instances than your computer has threads. So it will still be 8MB * max threads of the CPU as with a single instance. And this is much less than the hash table will require for long search efficiency.

@vondele
Copy link
Member

vondele commented Sep 16, 2019

I don't think 8MB is a problem wrt to a), b), or c)

SF is not an application that uses 100s of threads, but typically only 1 per core, and consumer hardware typically has >512Mb/core. While 8Mb is more than what is needed (right now ~<2Mb), the value of the patch is that it now has the same value for more OSes (ie. linux default), removing untested corner cases.

@snicolet snicolet closed this in a858def Sep 16, 2019
@snicolet
Copy link
Member

Merged via a858def, thanks!

@noobpwnftw noobpwnftw deleted the threads branch September 16, 2019 13:13
@MJZ1977
Copy link
Contributor

MJZ1977 commented Sep 16, 2019

This PR hase been merged with master without tests. Perhaps we need to do a least a [-3,1] STC test to be 100% sure there is no problem ?

@noobpwnftw
Copy link
Contributor Author

Well, most of the test machines are not affected with the change at all.

@MJZ1977
Copy link
Contributor

MJZ1977 commented Sep 16, 2019

TCEC S16 is showing us that we must be very careful with rare configurations. I don't know how to increase the probability of triggering the modification @Fichtest

@noobpwnftw
Copy link
Contributor Author

For that however, please refer to #2307 for a living example of how bugs affecting all platforms can still slip through.

@snicolet
Copy link
Member

The regression/progression of master after this patch against SF10 is underperforming. I have pushed some [-3..1] tests and will try to tell apart 7- and 8+ cores machines if necessary to estimate the situation better.

@MJZ1977 Thanks Moez!

@noobpwnftw
Copy link
Contributor Author

Removed most 8-core testers and there are a few 4-cores ones triggered by accident, removed them too.

@noobpwnftw
Copy link
Contributor Author

Doubts are fine, paranoia is not.
Recent trend show that people are being excessively conservative on making changes to code, sets of test positions, bounds, etc, as if there is nothing called revert, while being overly confident on sending those "optimized" compiles where actually there is no taking back.

@Alayan-stk-2
Copy link

Doubts are fine, paranoia is not.
Recent trend show that people are being excessively conservative on making changes to code, sets of test positions, bounds, etc, as if there is nothing called revert

I agree. Getting changes for the test positions or bounds considered let alone done is much more complicated than it should be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants