[WIP] Defer data preparation to native code. Use sparse input tensors. #1

Sopel97 · 2020-11-08T09:33:32Z

I'll leave this here for now, because it needs a lot of cleanup before merging. It would be nice if it was tested on different platforms. For getting it to run on linux one needs to change the name of the lib loaded by ctypes from *.dll to *.so.

This introduces a module written in C++ that provides loading batched training data from a file (currently cycles through the file because StopIteration is broken). It's compiled using cmake into a shared library and used in python through ctypes. There's a thin layer on top of it in python that mimicks the structures used and converts the raw arrays to tensors.

With this I'm reaching 80-85k positions/s on my GPU, compared to previous (master) 1.6k/s.

…lescing as it's faster without.

vondele · 2020-11-08T09:52:24Z

Segfault on running:

$ python3 test_dll_call.py
<CDLL './libdata_loader.so', handle 283c450 at 0x7f5a67f28eb0>
<_FuncPtr object at 0x7f5a67e5e400>
test successful
<_FuncPtr object at 0x7f5a5e058400>
<_FuncPtr object at 0x7f5a5e058040>
<__main__.LP_TestDataCollection object at 0x7f5a166f57c0>
10 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
73799056
 0.179 seconds
73799056
 0.618 seconds
74134720
 2.812 seconds
Segmentation fault (core dumped)

Likely backtrace:

$ gdb libdata_loader.so core
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from libdata_loader.so...
(No debugging symbols found in libdata_loader.so)

warning: core file may not match specified executable file.
[New LWP 25766]
[New LWP 25342]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `python3 test_dll_call.py'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f5a152cc135 in get_next_entry_halfkp_sparse_batch::{lambda()#1}::operator()() const () from ./libdata_loader.so
[Current thread is 1 (Thread 0x7f5a4e0c7700 (LWP 25766))]
(gdb) bt
#0  0x00007f5a152cc135 in get_next_entry_halfkp_sparse_batch::{lambda()#1}::operator()() const () from ./libdata_loader.so
#1  0x00007f5a152ce630 in TrainingEntryHalfKPSparseBatch* std::__invoke_impl<TrainingEntryHalfKPSparseBatch*, get_next_entry_halfkp_sparse_batch::{lambda()#1}>(std::__invoke_other, get_next_entry_halfkp_sparse_batch::{lambda()#1}&&) () from ./libdata_loader.so
#2  0x00007f5a152ce5d8 in std::__invoke_result<get_next_entry_halfkp_sparse_batch::{lambda()#1}>::type std::__invoke<get_next_entry_halfkp_sparse_batch::{lambda()#1}>(std::__invoke_result&&, (get_next_entry_halfkp_sparse_batch::{lambda()#1}&&)...) () from ./libdata_loader.so
#3  0x00007f5a152ce56c in TrainingEntryHalfKPSparseBatch* std::thread::_Invoker<std::tuple<get_next_entry_halfkp_sparse_batch::{lambda()#1}> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) () from ./libdata_loader.so
#4  0x00007f5a152ce4ec in std::thread::_Invoker<std::tuple<get_next_entry_halfkp_sparse_batch::{lambda()#1}> >::operator()() () from ./libdata_loader.so
#5  0x00007f5a152ce2a0 in std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<TrainingEntryHalfKPSparseBatch*>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<get_next_entry_halfkp_sparse_batch::{lambda()#1}> >, TrainingEntryHalfKPSparseBatch*>::operator()() const () from ./libdata_loader.so
#6  0x00007f5a152cdfab in std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<TrainingEntryHalfKPSparseBatch*>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<get_next_entry_halfkp_sparse_batch::{lambda()#1}> >, TrainingEntryHalfKPSparseBatch*> >::_M_invoke(std::_Any_data const&) ()
   from ./libdata_loader.so
#7  0x00007f5a152da99e in std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>::operator()() const () from ./libdata_loader.so
#8  0x00007f5a152cfe50 in std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) () from ./libdata_loader.so
#9  0x00007f5a152deec7 in void std::__invoke_impl<void, void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::__invoke_memfun_deref, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&) () from ./libdata_loader.so
#10 0x00007f5a152dd2ed in std::__invoke_result<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>::type std::__invoke<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&) () from ./libdata_loader.so
#11 0x00007f5a152da6d6 in std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&)::{lambda()#1}::operator()() const () from ./libdata_loader.so
#12 0x00007f5a152da70d in std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__futu--Type <RET> for more, q to--Type <RET> for more, q to quit, c to con--Type <RET> for more, q to quit, c to contin--Type <RET> for more, q--Type <RET> for more, q to quit, c to continue without paging--q
Quit
 ``

Sopel97 · 2020-11-08T09:54:44Z

Okay, the stream is destroyed while the async lambda is still running.

vondele · 2020-11-08T09:56:26Z

Also, on linux / gcc, the compile seems to be unoptimized by default, and complains about the cdecl:

[ 50%] Building CXX object CMakeFiles/data_loader.dir/data_loader.cpp.o
/usr/bin/c++  -Ddata_loader_EXPORTS  -fPIC   -std=gnu++17 -o CMakeFiles/data_loader.dir/data_loader.cpp.o -c /home/vondele/chess/nnue-pytorch/data_loader.cpp
/home/vondele/chess/nnue-pytorch/data_loader.cpp:354:28: warning: ‘cdecl’ attribute ignored [-Wattributes]
  354 |     EXPORT void CDECL test()
      |                            ^

vondele · 2020-11-08T10:03:34Z

Maybe add:

$ git diff
diff --git a/CMakeLists.txt b/CMakeLists.txt
index b973f06..7e0dbdb 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -2,6 +2,13 @@ cmake_minimum_required(VERSION 3.0)
 
 project(data_loader)
 
+if(NOT CMAKE_BUILD_TYPE)
+  set(CMAKE_BUILD_TYPE Release)
+endif()
+
+set(CMAKE_CXX_FLAGS_DEBUG "-g")
+set(CMAKE_CXX_FLAGS_RELEASE "-O3")
+
 set(CMAKE_CXX_STANDARD 17)
 set(CMAKE_CXX_STANDARD_REQUIRED 17)

this yields line numbers for the segfault, but the trace is different:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f160b0c0fe7 in std::basic_fstream<char, std::char_traits<char> >::operator=(std::basic_fstream<char, std::char_traits<char> >&&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
[Current thread is 1 (Thread 0x7f161dfbf700 (LWP 28191))]
(gdb) bt
#0  0x00007f160b0c0fe7 in std::basic_fstream<char, std::char_traits<char> >::operator=(std::basic_fstream<char, std::char_traits<char> >&&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x00007f160b1d0579 in training_data::BinSfenInputStream::next (this=0x37bf0b0) at /home/vondele/chess/nnue-pytorch/training_data_stream.h:70
#2  0x00007f160b1c414e in <lambda()>::operator()(void) const (__closure=0x37d9358) at /home/vondele/chess/nnue-pytorch/data_loader.cpp:478
#3  0x00007f160b1c6630 in std::__invoke_impl<TrainingEntryHalfKPSparseBatch*, get_next_entry_halfkp_sparse_batch(InputStreamHandle*, int)::<lambda()> >(std::__invoke_other, <lambda()> &&) (__f=...)
    at /usr/include/c++/9/bits/invoke.h:60
#4  0x00007f160b1c65d8 in std::__invoke<get_next_entry_halfkp_sparse_batch(InputStreamHandle*, int)::<lambda()> >(<lambda()> &&) (__fn=...) at /usr/include/c++/9/bits/invoke.h:95
#5  0x00007f160b1c656c in std::thread::_Invoker<std::tuple<get_next_entry_halfkp_sparse_batch(InputStreamHandle*, int)::<lambda()> > >::_M_invoke<0>(std::_Index_tuple<0>) (this=0x37d9358)
    at /usr/include/c++/9/thread:244
#6  0x00007f160b1c64ec in std::thread::_Invoker<std::tuple<get_next_entry_halfkp_sparse_batch(InputStreamHandle*, int)::<lambda()> > >::operator()(void) (this=0x37d9358)
    at /usr/include/c++/9/thread:251
#7  0x00007f160b1c62a0 in std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<TrainingEntryHalfKPSparseBatch*>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<get_next_entry_halfkp_sparse_batch(InputStreamHandle*, int)::<lambda()> > >, TrainingEntryHalfKPSparseBatch*>::operator()(void) const (this=0x7f161dfbedf0)
    at /usr/include/c++/9/future:1339
#8  0x00007f160b1c5fab in std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter>(), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<TrainingEntryHalfKPSparseBatch*>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<get_next_entry_halfkp_sparse_batch(InputStreamHandle*, int)::<lambda()> > >, TrainingEntryHalfKPSparseBatch*> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/9/bits/std_function.h:286
#9  0x00007f160b1d299e in std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>::operator()() const (this=0x7f161dfbedf0)
    at /usr/include/c++/9/bits/std_function.h:688
#10 0x00007f160b1c7e50 in std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) (
    this=0x37d9320, __f=0x7f161dfbedf0, __did_set=0x7f161dfbed4f) at /usr/include/c++/9/future:561
[...]

…m. Make the future batch a part of the stream.

Sopel97 · 2020-11-08T10:08:14Z

I think the crash is fixed now. It could have given many difference backtraces.

As for the cmake changes I'll change Release to RelWithDebInfo, but I'll refrain from adding gcc/clang specific flags and this is handled with properly setup CMAKE_BUILD_TYPE

vondele · 2020-11-08T10:11:27Z

python3 test_dll_call.py now works without segfault. Building is still '-O0'

vondele · 2020-11-08T10:18:08Z

At first sight no segfaults with training. Speed is low Epoch 0: : 1144it [04:24, 4.33it/s, loss=0.006, v_num=24] with GPU load about 10%. I'll try to manually add -O3 to the build. After that see if I can add the num_workers=32 or so.

Adding -O3 is much faster, 85% GPU load and Epoch 0: : 1825it [00:50, 35.92it/s, loss=0.004, v_num=25]

Sopel97 · 2020-11-08T10:25:19Z

Workers won't work. It has to be 0. This is because 1. sparse tensors. 2. we would need to handle it explicitely in our code.

36 it/s is 600k positions/s, nice.

Sopel97 · 2020-11-08T10:30:09Z

I've added the flags explicitely. It's weird that they are not added automatically. They are ignored for MSVC so no guarding needed.

vondele · 2020-11-08T10:58:29Z

Do Epochs still finish? It is at iteration 67000 and still at Epoch 0.

Sopel97 · 2020-11-08T11:03:11Z

Epochs won't finish. Normally with pytorch lightning an epoch is ended when the dataloader finishes. But since there's only one data loader I made it go cyclicly (infinitely) through the data. This is until the proper training data setup is done. IMO epochs shouldn't be tied to file size anyway because it's hard to manipulate their size then

vondele · 2020-11-08T14:06:17Z

If you clean up, maybe using something like:

diff --git a/external_nnue_data.py b/external_nnue_data.py
index 1102500..20c50df 100644
--- a/external_nnue_data.py
+++ b/external_nnue_data.py
@@ -2,7 +2,10 @@ import numpy as np
 import os
 import ctypes
 
-dll = ctypes.CDLL('c:/dev/nnue-pytorch/data_loader.dll')
+try:
+  dll = ctypes.CDLL('c:/dev/nnue-pytorch/data_loader.dll')
+except:
+  dll = ctypes.cdll.LoadLibrary('./libdata_loader.so')

would be enough to make run on both linux and windows out of the box?

Sopel97 · 2020-11-08T18:42:24Z

@vondele I do glob for either of .dll or .so. I think it should be enough and it doesn't rely on exceptions from ctypes.

Overall I cleaned this up. I removed the old data loading as it's not used anymore. halfkp.py now only contains constants. Tensor storage is uncoupled from the feature set; the feature set is identified by a string name and can be chosen dynamically when creating a data stream. Changed a lot of names. Test (demo) is cleaned up. lib folder for the training data format library stuff.

@glinscott should be ready for a final review and merge

vondele · 2020-11-08T18:46:24Z

@Sopel97 doesn't compile for me:

$ make -j VERBOSE=1
/usr/bin/cmake -S/home/vondele/chess/nnue-pytorch -B/home/vondele/chess/nnue-pytorch --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/cmake -E cmake_progress_start /home/vondele/chess/nnue-pytorch/CMakeFiles /home/vondele/chess/nnue-pytorch/CMakeFiles/progress.marks
make -f CMakeFiles/Makefile2 all
make[1]: Entering directory '/home/vondele/chess/nnue-pytorch'
make -f CMakeFiles/training_data_loader.dir/build.make CMakeFiles/training_data_loader.dir/depend
make[2]: Entering directory '/home/vondele/chess/nnue-pytorch'
cd /home/vondele/chess/nnue-pytorch && /usr/bin/cmake -E cmake_depends "Unix Makefiles" /home/vondele/chess/nnue-pytorch /home/vondele/chess/nnue-pytorch /home/vondele/chess/nnue-pytorch /home/vondele/chess/nnue-pytorch /home/vondele/chess/nnue-pytorch/CMakeFiles/training_data_loader.dir/DependInfo.cmake --color=
make[2]: Leaving directory '/home/vondele/chess/nnue-pytorch'
make -f CMakeFiles/training_data_loader.dir/build.make CMakeFiles/training_data_loader.dir/build
make[2]: Entering directory '/home/vondele/chess/nnue-pytorch'
[ 50%] Building CXX object CMakeFiles/training_data_loader.dir/training_data_loader.cpp.o
/usr/bin/c++  -Dtraining_data_loader_EXPORTS  -O3 -fPIC   -std=gnu++17 -o CMakeFiles/training_data_loader.dir/training_data_loader.cpp.o -c /home/vondele/chess/nnue-pytorch/training_data_loader.cpp
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp: In constructor ‘AsyncStream<StorageT>::AsyncStream(const char*, bool)’:
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:354:9: error: class ‘AsyncStream<StorageT>’ does not have any field named ‘Stream’
  354 |         Stream(filename, cyclic)
      |         ^~~~~~
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp: In constructor ‘FeaturedEntryStream<FeatureSetT, StorageT>::FeaturedEntryStream(const char*, bool)’:
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:378:9: error: class ‘FeaturedEntryStream<FeatureSetT, StorageT>’ does not have any field named ‘Stream’
  378 |         Stream(filename, cyclic)
      |         ^~~~~~
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp: In member function ‘StorageT* FeaturedEntryStream<FeatureSetT, StorageT>::next()’:
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:384:22: error: ‘m_stream’ was not declared in this scope; did you mean ‘Stream’?
  384 |         auto value = m_stream->next();
      |                      ^~~~~~~~
      |                      Stream
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp: In constructor ‘FeaturedBatchStream<FeatureSetT, StorageT>::FeaturedBatchStream(const char*, int, bool)’:
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:404:9: error: class ‘FeaturedBatchStream<FeatureSetT, StorageT>’ does not have any field named ‘AsyncStream’
  404 |         AsyncStream(filename, cyclic),
      |         ^~~~~~~~~~~
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp: In member function ‘StorageT* FeaturedBatchStream<FeatureSetT, StorageT>::next()’:
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:414:34: error: ‘m_next’ was not declared in this scope; did you mean ‘next’?
  414 |             auto cur = std::move(m_next);
      |                                  ^~~~~~
      |                                  next
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp: In lambda function:
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:427:34: error: ‘m_stream’ was not declared in this scope; did you mean ‘Stream’?
  427 |                     auto value = m_stream->next();
      |                                  ^~~~~~~~
      |                                  Stream
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp: At global scope:
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:467:122: warning: ‘cdecl’ attribute ignored [-Wattributes]
  467 |     EXPORT Stream<DenseEntry>* CDECL create_dense_entry_stream(const char* feature_set, const char* filename, bool cyclic)
      |                                                                                                                          ^
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:472:124: warning: ‘cdecl’ attribute ignored [-Wattributes]
  472 |     EXPORT Stream<SparseEntry>* CDECL create_sparse_entry_stream(const char* feature_set, const char* filename, bool cyclic)
      |                                                                                                                            ^
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:477:138: warning: ‘cdecl’ attribute ignored [-Wattributes]
  477 |     EXPORT Stream<DenseBatch>* CDECL create_dense_batch_stream(const char* feature_set, const char* filename, int batch_size, bool cyclic)
      |                                                                                                                                          ^
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:482:140: warning: ‘cdecl’ attribute ignored [-Wattributes]
  482 |     EXPORT Stream<SparseBatch>* CDECL create_sparse_batch_stream(const char* feature_set, const char* filename, int batch_size, bool cyclic)
      |                                                                                                                                            ^
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:487:76: warning: ‘cdecl’ attribute ignored [-Wattributes]
  487 |     EXPORT void CDECL destroy_dense_entry_stream(Stream<DenseEntry>* stream)
      |                                                                            ^
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:492:78: warning: ‘cdecl’ attribute ignored [-Wattributes]
  492 |     EXPORT void CDECL destroy_sparse_entry_stream(Stream<SparseEntry>* stream)
      |                                                                              ^
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:497:76: warning: ‘cdecl’ attribute ignored [-Wattributes]
  497 |     EXPORT void CDECL destroy_dense_batch_stream(Stream<DenseBatch>* stream)
      |                                                                            ^
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:502:78: warning: ‘cdecl’ attribute ignored [-Wattributes]
  502 |     EXPORT void CDECL destroy_sparse_batch_stream(Stream<SparseBatch>* stream)
      |                                                                              ^
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:507:79: warning: ‘cdecl’ attribute ignored [-Wattributes]
  507 |     EXPORT DenseEntry* CDECL fetch_next_dense_entry(Stream<DenseEntry>* stream)
      |                                                                               ^
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:512:82: warning: ‘cdecl’ attribute ignored [-Wattributes]
  512 |     EXPORT SparseEntry* CDECL fetch_next_sparse_entry(Stream<SparseEntry>* stream)
      |                                                                                  ^
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:517:79: warning: ‘cdecl’ attribute ignored [-Wattributes]
  517 |     EXPORT DenseBatch* CDECL fetch_next_dense_batch(Stream<DenseBatch>* stream)
      |                                                                               ^
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:522:82: warning: ‘cdecl’ attribute ignored [-Wattributes]
  522 |     EXPORT SparseBatch* CDECL fetch_next_sparse_batch(Stream<SparseBatch>* stream)
      |                                                                                  ^
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:527:56: warning: ‘cdecl’ attribute ignored [-Wattributes]
  527 |     EXPORT void CDECL destroy_dense_entry(DenseEntry* e)
      |                                                        ^
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:532:58: warning: ‘cdecl’ attribute ignored [-Wattributes]
  532 |     EXPORT void CDECL destroy_sparse_entry(SparseEntry* e)
      |                                                          ^
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:537:56: warning: ‘cdecl’ attribute ignored [-Wattributes]
  537 |     EXPORT void CDECL destroy_dense_batch(DenseBatch* e)
      |                                                        ^
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:542:58: warning: ‘cdecl’ attribute ignored [-Wattributes]
  542 |     EXPORT void CDECL destroy_sparse_batch(SparseBatch* e)
      |                                                          ^
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp: In instantiation of ‘FeaturedEntryStream<FeatureSetT, StorageT>::FeaturedEntryStream(const char*, bool) [with FeatureSetT = FeatureSet<HalfKP>; StorageT = DenseEntry]’:
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:457:16:   required from ‘StreamT<FeatureSet<HalfKP>, StorageT>* create_stream(std::string, ArgsTs&& ...) [with StreamT = FeaturedEntryStream; StorageT = DenseEntry; ArgsTs = {const char*&, bool&}; std::string = std::__cxx11::basic_string<char>]’
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:469:92:   required from here
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:378:32: error: no matching function for call to ‘Stream<DenseEntry>::Stream()’
  378 |         Stream(filename, cyclic)
      |                                ^
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:339:5: note: candidate: ‘Stream<StorageT>::Stream(const char*, bool) [with StorageT = DenseEntry]’
  339 |     Stream(const char* filename, bool cyclic) :
      |     ^~~~~~
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:339:5: note:   candidate expects 2 arguments, 0 provided
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:335:8: note: candidate: ‘Stream<DenseEntry>::Stream(Stream<DenseEntry>&&)’
  335 | struct Stream : AnyStream
      |        ^~~~~~
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:335:8: note:   candidate expects 1 argument, 0 provided
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp: In instantiation of ‘FeaturedEntryStream<FeatureSetT, StorageT>::FeaturedEntryStream(const char*, bool) [with FeatureSetT = FeatureSet<HalfKP>; StorageT = SparseEntry]’:
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:457:16:   required from ‘StreamT<FeatureSet<HalfKP>, StorageT>* create_stream(std::string, ArgsTs&& ...) [with StreamT = FeaturedEntryStream; StorageT = SparseEntry; ArgsTs = {const char*&, bool&}; std::string = std::__cxx11::basic_string<char>]’
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:474:93:   required from here
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:378:32: error: no matching function for call to ‘Stream<SparseEntry>::Stream()’
  378 |         Stream(filename, cyclic)
      |                                ^
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:339:5: note: candidate: ‘Stream<StorageT>::Stream(const char*, bool) [with StorageT = SparseEntry]’
  339 |     Stream(const char* filename, bool cyclic) :
      |     ^~~~~~
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:339:5: note:   candidate expects 2 arguments, 0 provided
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:335:8: note: candidate: ‘Stream<SparseEntry>::Stream(Stream<SparseEntry>&&)’
  335 | struct Stream : AnyStream
      |        ^~~~~~
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:335:8: note:   candidate expects 1 argument, 0 provided
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp: In instantiation of ‘FeaturedBatchStream<FeatureSetT, StorageT>::FeaturedBatchStream(const char*, int, bool) [with FeatureSetT = FeatureSet<HalfKP>; StorageT = DenseBatch]’:
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:457:16:   required from ‘StreamT<FeatureSet<HalfKP>, StorageT>* create_stream(std::string, ArgsTs&& ...) [with StreamT = FeaturedBatchStream; StorageT = DenseBatch; ArgsTs = {const char*&, int&, bool&}; std::string = std::__cxx11::basic_string<char>]’
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:479:104:   required from here
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:405:32: error: no matching function for call to ‘AsyncStream<DenseBatch>::AsyncStream()’
  405 |         m_batch_size(batch_size)
      |                                ^
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:353:5: note: candidate: ‘AsyncStream<StorageT>::AsyncStream(const char*, bool) [with StorageT = DenseBatch]’
  353 |     AsyncStream(const char* filename, bool cyclic) :
      |     ^~~~~~~~~~~
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:353:5: note:   candidate expects 2 arguments, 0 provided
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp: In instantiation of ‘FeaturedBatchStream<FeatureSetT, StorageT>::FeaturedBatchStream(const char*, int, bool) [with FeatureSetT = FeatureSet<HalfKP>; StorageT = SparseBatch]’:
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:457:16:   required from ‘StreamT<FeatureSet<HalfKP>, StorageT>* create_stream(std::string, ArgsTs&& ...) [with StreamT = FeaturedBatchStream; StorageT = SparseBatch; ArgsTs = {const char*&, int&, bool&}; std::string = std::__cxx11::basic_string<char>]’
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:484:105:   required from here
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:405:32: error: no matching function for call to ‘AsyncStream<SparseBatch>::AsyncStream()’
  405 |         m_batch_size(batch_size)
      |                                ^
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:353:5: note: candidate: ‘AsyncStream<StorageT>::AsyncStream(const char*, bool) [with StorageT = SparseBatch]’
  353 |     AsyncStream(const char* filename, bool cyclic) :
      |     ^~~~~~~~~~~
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:353:5: note:   candidate expects 2 arguments, 0 provided
make[2]: *** [CMakeFiles/training_data_loader.dir/build.make:63: CMakeFiles/training_data_loader.dir/training_data_loader.cpp.o] Error 1
make[2]: Leaving directory '/home/vondele/chess/nnue-pytorch'
make[1]: *** [CMakeFiles/Makefile2:76: CMakeFiles/training_data_loader.dir/all] Error 2
make[1]: Leaving directory '/home/vondele/chess/nnue-pytorch'
make: *** [Makefile:130: all] Error 2

Sopel97 · 2020-11-08T18:49:16Z

Okay, seems like gcc is unable to resolve some dependent names. I fixed it.

vondele · 2020-11-08T18:50:25Z

yes, compiles now, but doesn't run:

$ python3 train.py 
Cannot find data_loader shared library.

The lib is named : libtraining_data_loader.so

Sopel97 · 2020-11-08T18:51:46Z

Okay, fixed now.

vondele · 2020-11-08T18:53:21Z

OK, runs... Epoch 0: : 888it [00:16, 53.49it/s, loss=0.009, v_num=28]

Sopel97 · 2020-11-08T18:54:33Z

Can you try experiment a bit with the batch size and see what is the best for you? For me 8192 seems optimal, maybe twice that is slightly faster, more I run out of memory.

vondele · 2020-11-08T18:56:32Z

8192 -> 53it/s
2*8192 -> 36.46it/s
4*8192 -> 19.47it/s
8*8192 ->  9.40it/s

vondele · 2020-11-08T18:58:15Z

This will of course depend a bit on the GPU. I wonder what the bottleneck will be on a real GPU server (say 4xV100 or similar).

Sopel97 · 2020-11-08T19:00:14Z

So looks like 4*8192 works best on your system. Thanks for testing this.

vondele · 2020-11-08T19:02:39Z

That's also about 87% GPU-Util so probably limited by GPU (RTX 2070 super), in this case.

model.py

glinscott · 2020-11-08T20:33:50Z

nnue_bin_dataset.py

@@ -1,132 +0,0 @@
-import chess


Can we leave this one around for environments we can't compile the c++ version?

alternatively we could ship binaries...

Know what, I'll bring back the old data loader tomorrow, make it sparse, just so it's there, because you have a point. I think it's a matter for another PR though.

Sure, that's fine! Let's get this rolling, it's a gigantic improvement :).

glinscott · 2020-11-08T20:35:01Z

This is some incredible work, thank you so much! Just a few small comments.

Co-authored-by: Gary Linscott <glinscott@gmail.com>

Sopel97 added 13 commits November 7, 2020 12:43

Use sparse input tensor

7477716

Basic dll test.

4157291

A simple example of returning an array and converting it to a tensor

836308c

Add nnue binpack library.

7e25354

Simple unbatched training data passing.

78d6fdf

Use C++ data loader for data loading in the trainer

e611e0f

Improve dense tensor creation performance.

571fbea

Slightly improve sparse implementation

e27903c

Add batched loaders.

f8c8a34

Increase batch size for sparse for better speeds. Remove explicit coa…

ada0da4

…lescing as it's faster without.

Simple async data loading.

9443718

Cyclic reading.

97c0314

Increase batch size after optimizations.

b903f09

Fix not waiting for async loading to complete when deleting the strea…

5de27aa

…m. Make the future batch a part of the stream.

Compile with RelWithDebInfo by default.

95203ce

Explicitly specify build flags (ignored by MSVC)

116c6cc

Sopel97 added 2 commits November 8, 2020 17:50

Start cleaning up.

41acf57

More cleanup. Make feature set selection dynamic.

5795c10

Better test of the shared lib.

eefab70

Sopel97 marked this pull request as ready for review November 8, 2020 18:42

Pass feature set to NNUE

5ab6384

Fix compilation issues with GCC

5a026d4

Fix no 'lib' prefix when looking for .so

b059265

glinscott reviewed Nov 8, 2020

View reviewed changes

model.py Outdated Show resolved Hide resolved

glinscott reviewed Nov 8, 2020

View reviewed changes

Revert removing logging.

871432b

Co-authored-by: Gary Linscott <glinscott@gmail.com>

glinscott merged commit f50106b into official-stockfish:master Nov 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Defer data preparation to native code. Use sparse input tensors. #1

[WIP] Defer data preparation to native code. Use sparse input tensors. #1

Sopel97 commented Nov 8, 2020

vondele commented Nov 8, 2020

Sopel97 commented Nov 8, 2020

vondele commented Nov 8, 2020

vondele commented Nov 8, 2020

Sopel97 commented Nov 8, 2020

vondele commented Nov 8, 2020

vondele commented Nov 8, 2020 •

edited

Loading

Sopel97 commented Nov 8, 2020 •

edited

Loading

Sopel97 commented Nov 8, 2020

vondele commented Nov 8, 2020

Sopel97 commented Nov 8, 2020 •

edited

Loading

vondele commented Nov 8, 2020

Sopel97 commented Nov 8, 2020 •

edited

Loading

vondele commented Nov 8, 2020

Sopel97 commented Nov 8, 2020

vondele commented Nov 8, 2020

Sopel97 commented Nov 8, 2020

vondele commented Nov 8, 2020

Sopel97 commented Nov 8, 2020

vondele commented Nov 8, 2020 •

edited

Loading

vondele commented Nov 8, 2020

Sopel97 commented Nov 8, 2020

vondele commented Nov 8, 2020

glinscott Nov 8, 2020

Sopel97 Nov 8, 2020

Sopel97 Nov 8, 2020

glinscott Nov 8, 2020

glinscott commented Nov 8, 2020

[WIP] Defer data preparation to native code. Use sparse input tensors. #1

[WIP] Defer data preparation to native code. Use sparse input tensors. #1

Conversation

Sopel97 commented Nov 8, 2020

vondele commented Nov 8, 2020

Sopel97 commented Nov 8, 2020

vondele commented Nov 8, 2020

vondele commented Nov 8, 2020

Sopel97 commented Nov 8, 2020

vondele commented Nov 8, 2020

vondele commented Nov 8, 2020 • edited Loading

Sopel97 commented Nov 8, 2020 • edited Loading

Sopel97 commented Nov 8, 2020

vondele commented Nov 8, 2020

Sopel97 commented Nov 8, 2020 • edited Loading

vondele commented Nov 8, 2020

Sopel97 commented Nov 8, 2020 • edited Loading

vondele commented Nov 8, 2020

Sopel97 commented Nov 8, 2020

vondele commented Nov 8, 2020

Sopel97 commented Nov 8, 2020

vondele commented Nov 8, 2020

Sopel97 commented Nov 8, 2020

vondele commented Nov 8, 2020 • edited Loading

vondele commented Nov 8, 2020

Sopel97 commented Nov 8, 2020

vondele commented Nov 8, 2020

glinscott Nov 8, 2020

Choose a reason for hiding this comment

Sopel97 Nov 8, 2020

Choose a reason for hiding this comment

Sopel97 Nov 8, 2020

Choose a reason for hiding this comment

glinscott Nov 8, 2020

Choose a reason for hiding this comment

glinscott commented Nov 8, 2020

vondele commented Nov 8, 2020 •

edited

Loading

Sopel97 commented Nov 8, 2020 •

edited

Loading

Sopel97 commented Nov 8, 2020 •

edited

Loading

Sopel97 commented Nov 8, 2020 •

edited

Loading

vondele commented Nov 8, 2020 •

edited

Loading