Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible data race #2933

Closed
vondele opened this issue Aug 7, 2020 · 3 comments
Closed

Possible data race #2933

vondele opened this issue Aug 7, 2020 · 3 comments

Comments

@vondele
Copy link
Member

vondele commented Aug 7, 2020

While integrating NNUE bench in CI, the thread sanitizer reports a data race, which I presumably have not seen on my by-hand tests:

#2931
https://travis-ci.org/github/official-stockfish/Stockfish/jobs/715961393

like

WARNING: ThreadSanitizer: data race (pid=99505)
  Write of size 8 at 0x7b68000006c0 by thread T5:
    #0 memcpy <null> (libtsan.so.0+0x42333)
    #1 Eval::NNUE::FeatureTransformer::RefreshAccumulator(Position const&) const nnue/nnue_feature_transformer.h:194 (stockfish+0xa591e)
    #2 Eval::NNUE::FeatureTransformer::Transform(Position const&, unsigned char*, bool) const nnue/nnue_feature_transformer.h:82 (stockfish+0xa5549)
    #3 ComputeScore nnue/evaluate_nnue.cpp:136 (stockfish+0xa4f0e)
    #4 Eval::NNUE::evaluate(Position const&) nnue/evaluate_nnue.cpp:162 (stockfish+0xa50d9)
    #5 Eval::evaluate(Position const&) /home/vondele/chess/vondele/Stockfish/src/evaluate.cpp:945 (stockfish+0x21bd8)
    #6 search<(<unnamed>::NodeType)1> /home/vondele/chess/vondele/Stockfish/src/search.cpp:800 (stockfish+0x5a2a7)
    #7 Thread::search() /home/vondele/chess/vondele/Stockfish/src/search.cpp:424 (stockfish+0x55eb4)
    #8 Thread::idle_loop() /home/vondele/chess/vondele/Stockfish/src/thread.cpp:130 (stockfish+0x6f8a8)
    #9 void std::__invoke_impl<void, void (Thread::*)(), Thread*>(std::__invoke_memfun_deref, void (Thread::*&&)(), Thread*&&) /usr/include/c++/9/bits/invoke.h:73 (stockfish+0x7b6cb)
    #10 std::__invoke_result<void (Thread::*)(), Thread*>::type std::__invoke<void (Thread::*)(), Thread*>(void (Thread::*&&)(), Thread*&&) /usr/include/c++/9/bits/invoke.h:95 (stockfish+0x7b58a)
    #11 void std::thread::_Invoker<std::tuple<void (Thread::*)(), Thread*> >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) /usr/include/c++/9/thread:244 (stockfish+0x7b49c)
    #12 std::thread::_Invoker<std::tuple<void (Thread::*)(), Thread*> >::operator()() /usr/include/c++/9/thread:251 (stockfish+0x7b441)
    #13 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (Thread::*)(), Thread*> > >::_M_run() /usr/include/c++/9/thread:195 (stockfish+0x7b411)
    #14 <null> <null> (libstdc++.so.6+0xd6cb3)

  Previous write of size 8 at 0x7b68000006c0 by thread T4:
    #0 memcpy <null> (libtsan.so.0+0x42333)
    #1 Eval::NNUE::FeatureTransformer::RefreshAccumulator(Position const&) const nnue/nnue_feature_transformer.h:194 (stockfish+0xa591e)
    #2 Eval::NNUE::FeatureTransformer::Transform(Position const&, unsigned char*, bool) const nnue/nnue_feature_transformer.h:82 (stockfish+0xa5549)
    #3 ComputeScore nnue/evaluate_nnue.cpp:136 (stockfish+0xa4f0e)
    #4 Eval::NNUE::evaluate(Position const&) nnue/evaluate_nnue.cpp:162 (stockfish+0xa50d9)
    #5 Eval::evaluate(Position const&) /home/vondele/chess/vondele/Stockfish/src/evaluate.cpp:945 (stockfish+0x21bd8)
    #6 search<(<unnamed>::NodeType)1> /home/vondele/chess/vondele/Stockfish/src/search.cpp:800 (stockfish+0x5a2a7)
    #7 Thread::search() /home/vondele/chess/vondele/Stockfish/src/search.cpp:424 (stockfish+0x55eb4)
    #8 MainThread::search() /home/vondele/chess/vondele/Stockfish/src/search.cpp:240 (stockfish+0x54355)
    #9 Thread::idle_loop() /home/vondele/chess/vondele/Stockfish/src/thread.cpp:130 (stockfish+0x6f8a8)
    #10 void std::__invoke_impl<void, void (Thread::*)(), Thread*>(std::__invoke_memfun_deref, void (Thread::*&&)(), Thread*&&) /usr/include/c++/9/bits/invoke.h:73 (stockfish+0x7b6cb)
    #11 std::__invoke_result<void (Thread::*)(), Thread*>::type std::__invoke<void (Thread::*)(), Thread*>(void (Thread::*&&)(), Thread*&&) /usr/include/c++/9/bits/invoke.h:95 (stockfish+0x7b58a)
    #12 void std::thread::_Invoker<std::tuple<void (Thread::*)(), Thread*> >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) /usr/include/c++/9/thread:244 (stockfish+0x7b49c)
    #13 std::thread::_Invoker<std::tuple<void (Thread::*)(), Thread*> >::operator()() /usr/include/c++/9/thread:251 (stockfish+0x7b441)
    #14 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (Thread::*)(), Thread*> > >::_M_run() /usr/include/c++/9/thread:195 (stockfish+0x7b411)
    #15 <null> <null> (libstdc++.so.6+0xd6cb3)

  Location is heap block of size 1312 at 0x7b6800000600 allocated by main thread:
    #0 operator new(unsigned long, std::align_val_t) <null> (libtsan.so.0+0x8c2b5)
    #1 __gnu_cxx::new_allocator<StateInfo>::allocate(unsigned long, void const*) /usr/include/c++/9/ext/new_allocator.h:111 (stockfish+0x83721)
    #2 std::allocator_traits<std::allocator<StateInfo> >::allocate(std::allocator<StateInfo>&, unsigned long) /usr/include/c++/9/bits/alloc_traits.h:444 (stockfish+0x831e2)
    #3 std::_Deque_base<StateInfo, std::allocator<StateInfo> >::_M_allocate_node() /usr/include/c++/9/bits/stl_deque.h:620 (stockfish+0x82560)
    #4 std::_Deque_base<StateInfo, std::allocator<StateInfo> >::_M_create_nodes(StateInfo**, StateInfo**) /usr/include/c++/9/bits/stl_deque.h:745 (stockfish+0x82cb6)
    #5 std::_Deque_base<StateInfo, std::allocator<StateInfo> >::_M_initialize_map(unsigned long) /usr/include/c++/9/bits/stl_deque.h:719 (stockfish+0x82225)
    #6 std::_Deque_base<StateInfo, std::allocator<StateInfo> >::_Deque_base(std::allocator<StateInfo> const&, unsigned long) /usr/include/c++/9/bits/stl_deque.h:518 (stockfish+0x81837)
    #7 std::deque<StateInfo, std::allocator<StateInfo> >::deque(unsigned long, std::allocator<StateInfo> const&) /usr/include/c++/9/bits/stl_deque.h:936 (stockfish+0x81435)
    #8 position /home/vondele/chess/vondele/Stockfish/src/uci.cpp:68 (stockfish+0x7ed22)
    #9 bench /home/vondele/chess/vondele/Stockfish/src/uci.cpp:184 (stockfish+0x7fc85)
    #10 UCI::loop(int, char**) /home/vondele/chess/vondele/Stockfish/src/uci.cpp:274 (stockfish+0x80636)
    #11 main /home/vondele/chess/vondele/Stockfish/src/main.cpp:49 (stockfish+0x36c79)

@nodchip @dorzechowski @mstembera any idea if there are data structures shared between the threads that should not be. It appears to refer to pos.state()->accumulator

@vondele vondele added the NNUE label Aug 7, 2020
@vondele
Copy link
Member Author

vondele commented Aug 7, 2020

To reproduce outside of CI, try:

make clean && make -j2 ARCH=x86-64 sanitize=thread    optimize=no debug=yes build

cat << EOF > tsan.supp
race:TTEntry::move
race:TTEntry::depth
race:TTEntry::bound
race:TTEntry::save
race:TTEntry::value
race:TTEntry::eval
race:TTEntry::is_pv
race:TranspositionTable::probe
race:TranspositionTable::hashfull
EOF

export TSAN_OPTIONS="suppressions=./tsan.supp"

cat << EOF > inp
setoption name Use NNUE value true
bench 16 2 13 default depth
EOF

./stockfish < inp

@nodchip
Copy link

nodchip commented Aug 8, 2020

#8 position /home/vondele/chess/vondele/Stockfish/src/uci.cpp:68 (stockfish+0x7ed22)

This shows that the memory which occurs race condition was allocated in the line 64 of uci.cpp.

states = StateListPtr(new std::deque<StateInfo>(1)); // Drop old and create a new one

The address of the first element seems to be shared between threads in the line 217 of thread.cpp.

th->rootPos.set(pos.fen(), pos.is_chess960(), &setupStates->back(), th);

I think that this may cause data race.

@vondele
Copy link
Member Author

vondele commented Aug 8, 2020

I think the underlying reason is that previously the state for position rootpos was readonly, now it seems it is not. See near line 208 of thread.cpp:

  // We use Position::set() to set root position across threads. But there are
  // some StateInfo fields (previous, pliesFromNull, capturedPiece) that cannot
  // be deduced from a fen string, so set() clears them and to not lose the info
  // we need to backup and later restore setupStates->back(). Note that setupStates
  // is shared by threads but is accessed in read-only mode.
  StateInfo tmp = setupStates->back();

I won't be able to look into that soon, so if somebody comes up with a solution.

vondele added a commit to vondele/Stockfish that referenced this issue Aug 9, 2020
the stateInfo at the rootPos is no longer read-only, as the NNUE accumulator is part of it.
Threads can thus not share this object and need their own copy.

tested for no regression
https://tests.stockfishchess.org/tests/view/5f3022239081672066536bce
LLR: 2.96 (-2.94,2.94) {-1.50,0.50}
Total: 52800 W: 6843 L: 6802 D: 39155
Ptnml(0-2): 336, 4646, 16399, 4679, 340

fixes official-stockfish#2933

No functional change
@vondele vondele closed this as completed in 27b593a Aug 9, 2020
joergoster pushed a commit to joergoster/Stockfish-old that referenced this issue Aug 10, 2020
the stateInfo at the rootPos is no longer read-only, as the NNUE accumulator is part of it.
Threads can thus not share this object and need their own copy.

tested for no regression
https://tests.stockfishchess.org/tests/view/5f3022239081672066536bce
LLR: 2.96 (-2.94,2.94) {-1.50,0.50}
Total: 52800 W: 6843 L: 6802 D: 39155
Ptnml(0-2): 336, 4646, 16399, 4679, 340

closes official-stockfish/Stockfish#2957

fixes official-stockfish/Stockfish#2933

No functional change
lucabrivio pushed a commit to lucabrivio/Stockfish that referenced this issue Aug 10, 2020
the stateInfo at the rootPos is no longer read-only, as the NNUE accumulator is part of it.
Threads can thus not share this object and need their own copy.

tested for no regression
https://tests.stockfishchess.org/tests/view/5f3022239081672066536bce
LLR: 2.96 (-2.94,2.94) {-1.50,0.50}
Total: 52800 W: 6843 L: 6802 D: 39155
Ptnml(0-2): 336, 4646, 16399, 4679, 340

closes official-stockfish/Stockfish#2957

fixes official-stockfish/Stockfish#2933

No functional change
noobpwnftw pushed a commit to noobpwnftw/Stockfish that referenced this issue Aug 15, 2020
the stateInfo at the rootPos is no longer read-only, as the NNUE accumulator is part of it.
Threads can thus not share this object and need their own copy.

tested for no regression
https://tests.stockfishchess.org/tests/view/5f3022239081672066536bce
LLR: 2.96 (-2.94,2.94) {-1.50,0.50}
Total: 52800 W: 6843 L: 6802 D: 39155
Ptnml(0-2): 336, 4646, 16399, 4679, 340

closes official-stockfish#2957

fixes official-stockfish#2933

No functional change
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants