Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Defer data preparation to native code. Use sparse input tensors. #1

Merged
merged 23 commits into from
Nov 8, 2020

Conversation

Sopel97
Copy link
Member

@Sopel97 Sopel97 commented Nov 8, 2020

I'll leave this here for now, because it needs a lot of cleanup before merging. It would be nice if it was tested on different platforms. For getting it to run on linux one needs to change the name of the lib loaded by ctypes from *.dll to *.so.

This introduces a module written in C++ that provides loading batched training data from a file (currently cycles through the file because StopIteration is broken). It's compiled using cmake into a shared library and used in python through ctypes. There's a thin layer on top of it in python that mimicks the structures used and converts the raw arrays to tensors.

With this I'm reaching 80-85k positions/s on my GPU, compared to previous (master) 1.6k/s.

@vondele
Copy link
Member

vondele commented Nov 8, 2020

Segfault on running:

$ python3 test_dll_call.py
<CDLL './libdata_loader.so', handle 283c450 at 0x7f5a67f28eb0>
<_FuncPtr object at 0x7f5a67e5e400>
test successful
<_FuncPtr object at 0x7f5a5e058400>
<_FuncPtr object at 0x7f5a5e058040>
<__main__.LP_TestDataCollection object at 0x7f5a166f57c0>
10 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
73799056
 0.179 seconds
73799056
 0.618 seconds
74134720
 2.812 seconds
Segmentation fault (core dumped)

Likely backtrace:

$ gdb libdata_loader.so core
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from libdata_loader.so...
(No debugging symbols found in libdata_loader.so)

warning: core file may not match specified executable file.
[New LWP 25766]
[New LWP 25342]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `python3 test_dll_call.py'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f5a152cc135 in get_next_entry_halfkp_sparse_batch::{lambda()#1}::operator()() const () from ./libdata_loader.so
[Current thread is 1 (Thread 0x7f5a4e0c7700 (LWP 25766))]
(gdb) bt
#0  0x00007f5a152cc135 in get_next_entry_halfkp_sparse_batch::{lambda()#1}::operator()() const () from ./libdata_loader.so
#1  0x00007f5a152ce630 in TrainingEntryHalfKPSparseBatch* std::__invoke_impl<TrainingEntryHalfKPSparseBatch*, get_next_entry_halfkp_sparse_batch::{lambda()#1}>(std::__invoke_other, get_next_entry_halfkp_sparse_batch::{lambda()#1}&&) () from ./libdata_loader.so
#2  0x00007f5a152ce5d8 in std::__invoke_result<get_next_entry_halfkp_sparse_batch::{lambda()#1}>::type std::__invoke<get_next_entry_halfkp_sparse_batch::{lambda()#1}>(std::__invoke_result&&, (get_next_entry_halfkp_sparse_batch::{lambda()#1}&&)...) () from ./libdata_loader.so
#3  0x00007f5a152ce56c in TrainingEntryHalfKPSparseBatch* std::thread::_Invoker<std::tuple<get_next_entry_halfkp_sparse_batch::{lambda()#1}> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) () from ./libdata_loader.so
#4  0x00007f5a152ce4ec in std::thread::_Invoker<std::tuple<get_next_entry_halfkp_sparse_batch::{lambda()#1}> >::operator()() () from ./libdata_loader.so
#5  0x00007f5a152ce2a0 in std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<TrainingEntryHalfKPSparseBatch*>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<get_next_entry_halfkp_sparse_batch::{lambda()#1}> >, TrainingEntryHalfKPSparseBatch*>::operator()() const () from ./libdata_loader.so
#6  0x00007f5a152cdfab in std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<TrainingEntryHalfKPSparseBatch*>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<get_next_entry_halfkp_sparse_batch::{lambda()#1}> >, TrainingEntryHalfKPSparseBatch*> >::_M_invoke(std::_Any_data const&) ()
   from ./libdata_loader.so
#7  0x00007f5a152da99e in std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>::operator()() const () from ./libdata_loader.so
#8  0x00007f5a152cfe50 in std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) () from ./libdata_loader.so
#9  0x00007f5a152deec7 in void std::__invoke_impl<void, void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::__invoke_memfun_deref, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&) () from ./libdata_loader.so
#10 0x00007f5a152dd2ed in std::__invoke_result<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>::type std::__invoke<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&) () from ./libdata_loader.so
#11 0x00007f5a152da6d6 in std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&)::{lambda()#1}::operator()() const () from ./libdata_loader.so
#12 0x00007f5a152da70d in std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::once_flag&, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__futu--Type <RET> for more, q to--Type <RET> for more, q to quit, c to con--Type <RET> for more, q to quit, c to contin--Type <RET> for more, q--Type <RET> for more, q to quit, c to continue without paging--q
Quit
 ``

@Sopel97
Copy link
Member Author

Sopel97 commented Nov 8, 2020

Okay, the stream is destroyed while the async lambda is still running.

@vondele
Copy link
Member

vondele commented Nov 8, 2020

Also, on linux / gcc, the compile seems to be unoptimized by default, and complains about the cdecl:

[ 50%] Building CXX object CMakeFiles/data_loader.dir/data_loader.cpp.o
/usr/bin/c++  -Ddata_loader_EXPORTS  -fPIC   -std=gnu++17 -o CMakeFiles/data_loader.dir/data_loader.cpp.o -c /home/vondele/chess/nnue-pytorch/data_loader.cpp
/home/vondele/chess/nnue-pytorch/data_loader.cpp:354:28: warning: ‘cdecl’ attribute ignored [-Wattributes]
  354 |     EXPORT void CDECL test()
      |                            ^

@vondele
Copy link
Member

vondele commented Nov 8, 2020

Maybe add:

$ git diff
diff --git a/CMakeLists.txt b/CMakeLists.txt
index b973f06..7e0dbdb 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -2,6 +2,13 @@ cmake_minimum_required(VERSION 3.0)
 
 project(data_loader)
 
+if(NOT CMAKE_BUILD_TYPE)
+  set(CMAKE_BUILD_TYPE Release)
+endif()
+
+set(CMAKE_CXX_FLAGS_DEBUG "-g")
+set(CMAKE_CXX_FLAGS_RELEASE "-O3")
+
 set(CMAKE_CXX_STANDARD 17)
 set(CMAKE_CXX_STANDARD_REQUIRED 17)
 

this yields line numbers for the segfault, but the trace is different:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f160b0c0fe7 in std::basic_fstream<char, std::char_traits<char> >::operator=(std::basic_fstream<char, std::char_traits<char> >&&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
[Current thread is 1 (Thread 0x7f161dfbf700 (LWP 28191))]
(gdb) bt
#0  0x00007f160b0c0fe7 in std::basic_fstream<char, std::char_traits<char> >::operator=(std::basic_fstream<char, std::char_traits<char> >&&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x00007f160b1d0579 in training_data::BinSfenInputStream::next (this=0x37bf0b0) at /home/vondele/chess/nnue-pytorch/training_data_stream.h:70
#2  0x00007f160b1c414e in <lambda()>::operator()(void) const (__closure=0x37d9358) at /home/vondele/chess/nnue-pytorch/data_loader.cpp:478
#3  0x00007f160b1c6630 in std::__invoke_impl<TrainingEntryHalfKPSparseBatch*, get_next_entry_halfkp_sparse_batch(InputStreamHandle*, int)::<lambda()> >(std::__invoke_other, <lambda()> &&) (__f=...)
    at /usr/include/c++/9/bits/invoke.h:60
#4  0x00007f160b1c65d8 in std::__invoke<get_next_entry_halfkp_sparse_batch(InputStreamHandle*, int)::<lambda()> >(<lambda()> &&) (__fn=...) at /usr/include/c++/9/bits/invoke.h:95
#5  0x00007f160b1c656c in std::thread::_Invoker<std::tuple<get_next_entry_halfkp_sparse_batch(InputStreamHandle*, int)::<lambda()> > >::_M_invoke<0>(std::_Index_tuple<0>) (this=0x37d9358)
    at /usr/include/c++/9/thread:244
#6  0x00007f160b1c64ec in std::thread::_Invoker<std::tuple<get_next_entry_halfkp_sparse_batch(InputStreamHandle*, int)::<lambda()> > >::operator()(void) (this=0x37d9358)
    at /usr/include/c++/9/thread:251
#7  0x00007f160b1c62a0 in std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<TrainingEntryHalfKPSparseBatch*>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<get_next_entry_halfkp_sparse_batch(InputStreamHandle*, int)::<lambda()> > >, TrainingEntryHalfKPSparseBatch*>::operator()(void) const (this=0x7f161dfbedf0)
    at /usr/include/c++/9/future:1339
#8  0x00007f160b1c5fab in std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter>(), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<TrainingEntryHalfKPSparseBatch*>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<get_next_entry_halfkp_sparse_batch(InputStreamHandle*, int)::<lambda()> > >, TrainingEntryHalfKPSparseBatch*> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/9/bits/std_function.h:286
#9  0x00007f160b1d299e in std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>::operator()() const (this=0x7f161dfbedf0)
    at /usr/include/c++/9/bits/std_function.h:688
#10 0x00007f160b1c7e50 in std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) (
    this=0x37d9320, __f=0x7f161dfbedf0, __did_set=0x7f161dfbed4f) at /usr/include/c++/9/future:561
[...]

…m. Make the future batch a part of the stream.
@Sopel97
Copy link
Member Author

Sopel97 commented Nov 8, 2020

I think the crash is fixed now. It could have given many difference backtraces.

As for the cmake changes I'll change Release to RelWithDebInfo, but I'll refrain from adding gcc/clang specific flags and this is handled with properly setup CMAKE_BUILD_TYPE

@vondele
Copy link
Member

vondele commented Nov 8, 2020

python3 test_dll_call.py now works without segfault. Building is still '-O0'

@vondele
Copy link
Member

vondele commented Nov 8, 2020

At first sight no segfaults with training. Speed is low Epoch 0: : 1144it [04:24, 4.33it/s, loss=0.006, v_num=24] with GPU load about 10%. I'll try to manually add -O3 to the build. After that see if I can add the num_workers=32 or so.

Adding -O3 is much faster, 85% GPU load and Epoch 0: : 1825it [00:50, 35.92it/s, loss=0.004, v_num=25]

@Sopel97
Copy link
Member Author

Sopel97 commented Nov 8, 2020

Workers won't work. It has to be 0. This is because 1. sparse tensors. 2. we would need to handle it explicitely in our code.

36 it/s is 600k positions/s, nice.

@Sopel97
Copy link
Member Author

Sopel97 commented Nov 8, 2020

I've added the flags explicitely. It's weird that they are not added automatically. They are ignored for MSVC so no guarding needed.

@vondele
Copy link
Member

vondele commented Nov 8, 2020

Do Epochs still finish? It is at iteration 67000 and still at Epoch 0.

@Sopel97
Copy link
Member Author

Sopel97 commented Nov 8, 2020

Epochs won't finish. Normally with pytorch lightning an epoch is ended when the dataloader finishes. But since there's only one data loader I made it go cyclicly (infinitely) through the data. This is until the proper training data setup is done. IMO epochs shouldn't be tied to file size anyway because it's hard to manipulate their size then

@vondele
Copy link
Member

vondele commented Nov 8, 2020

If you clean up, maybe using something like:

diff --git a/external_nnue_data.py b/external_nnue_data.py
index 1102500..20c50df 100644
--- a/external_nnue_data.py
+++ b/external_nnue_data.py
@@ -2,7 +2,10 @@ import numpy as np
 import os
 import ctypes
 
-dll = ctypes.CDLL('c:/dev/nnue-pytorch/data_loader.dll')
+try:
+  dll = ctypes.CDLL('c:/dev/nnue-pytorch/data_loader.dll')
+except:
+  dll = ctypes.cdll.LoadLibrary('./libdata_loader.so')

would be enough to make run on both linux and windows out of the box?

@Sopel97
Copy link
Member Author

Sopel97 commented Nov 8, 2020

@vondele I do glob for either of .dll or .so. I think it should be enough and it doesn't rely on exceptions from ctypes.

Overall I cleaned this up. I removed the old data loading as it's not used anymore. halfkp.py now only contains constants. Tensor storage is uncoupled from the feature set; the feature set is identified by a string name and can be chosen dynamically when creating a data stream. Changed a lot of names. Test (demo) is cleaned up. lib folder for the training data format library stuff.

@glinscott should be ready for a final review and merge

@Sopel97 Sopel97 marked this pull request as ready for review November 8, 2020 18:42
@vondele
Copy link
Member

vondele commented Nov 8, 2020

@Sopel97 doesn't compile for me:

$ make -j VERBOSE=1
/usr/bin/cmake -S/home/vondele/chess/nnue-pytorch -B/home/vondele/chess/nnue-pytorch --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/cmake -E cmake_progress_start /home/vondele/chess/nnue-pytorch/CMakeFiles /home/vondele/chess/nnue-pytorch/CMakeFiles/progress.marks
make -f CMakeFiles/Makefile2 all
make[1]: Entering directory '/home/vondele/chess/nnue-pytorch'
make -f CMakeFiles/training_data_loader.dir/build.make CMakeFiles/training_data_loader.dir/depend
make[2]: Entering directory '/home/vondele/chess/nnue-pytorch'
cd /home/vondele/chess/nnue-pytorch && /usr/bin/cmake -E cmake_depends "Unix Makefiles" /home/vondele/chess/nnue-pytorch /home/vondele/chess/nnue-pytorch /home/vondele/chess/nnue-pytorch /home/vondele/chess/nnue-pytorch /home/vondele/chess/nnue-pytorch/CMakeFiles/training_data_loader.dir/DependInfo.cmake --color=
make[2]: Leaving directory '/home/vondele/chess/nnue-pytorch'
make -f CMakeFiles/training_data_loader.dir/build.make CMakeFiles/training_data_loader.dir/build
make[2]: Entering directory '/home/vondele/chess/nnue-pytorch'
[ 50%] Building CXX object CMakeFiles/training_data_loader.dir/training_data_loader.cpp.o
/usr/bin/c++  -Dtraining_data_loader_EXPORTS  -O3 -fPIC   -std=gnu++17 -o CMakeFiles/training_data_loader.dir/training_data_loader.cpp.o -c /home/vondele/chess/nnue-pytorch/training_data_loader.cpp
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp: In constructor ‘AsyncStream<StorageT>::AsyncStream(const char*, bool)’:
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:354:9: error: class ‘AsyncStream<StorageT>’ does not have any field named ‘Stream’
  354 |         Stream(filename, cyclic)
      |         ^~~~~~
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp: In constructor ‘FeaturedEntryStream<FeatureSetT, StorageT>::FeaturedEntryStream(const char*, bool)’:
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:378:9: error: class ‘FeaturedEntryStream<FeatureSetT, StorageT>’ does not have any field named ‘Stream’
  378 |         Stream(filename, cyclic)
      |         ^~~~~~
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp: In member function ‘StorageT* FeaturedEntryStream<FeatureSetT, StorageT>::next()’:
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:384:22: error: ‘m_stream’ was not declared in this scope; did you mean ‘Stream’?
  384 |         auto value = m_stream->next();
      |                      ^~~~~~~~
      |                      Stream
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp: In constructor ‘FeaturedBatchStream<FeatureSetT, StorageT>::FeaturedBatchStream(const char*, int, bool)’:
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:404:9: error: class ‘FeaturedBatchStream<FeatureSetT, StorageT>’ does not have any field named ‘AsyncStream’
  404 |         AsyncStream(filename, cyclic),
      |         ^~~~~~~~~~~
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp: In member function ‘StorageT* FeaturedBatchStream<FeatureSetT, StorageT>::next()’:
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:414:34: error: ‘m_next’ was not declared in this scope; did you mean ‘next’?
  414 |             auto cur = std::move(m_next);
      |                                  ^~~~~~
      |                                  next
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp: In lambda function:
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:427:34: error: ‘m_stream’ was not declared in this scope; did you mean ‘Stream’?
  427 |                     auto value = m_stream->next();
      |                                  ^~~~~~~~
      |                                  Stream
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp: At global scope:
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:467:122: warning: ‘cdecl’ attribute ignored [-Wattributes]
  467 |     EXPORT Stream<DenseEntry>* CDECL create_dense_entry_stream(const char* feature_set, const char* filename, bool cyclic)
      |                                                                                                                          ^
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:472:124: warning: ‘cdecl’ attribute ignored [-Wattributes]
  472 |     EXPORT Stream<SparseEntry>* CDECL create_sparse_entry_stream(const char* feature_set, const char* filename, bool cyclic)
      |                                                                                                                            ^
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:477:138: warning: ‘cdecl’ attribute ignored [-Wattributes]
  477 |     EXPORT Stream<DenseBatch>* CDECL create_dense_batch_stream(const char* feature_set, const char* filename, int batch_size, bool cyclic)
      |                                                                                                                                          ^
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:482:140: warning: ‘cdecl’ attribute ignored [-Wattributes]
  482 |     EXPORT Stream<SparseBatch>* CDECL create_sparse_batch_stream(const char* feature_set, const char* filename, int batch_size, bool cyclic)
      |                                                                                                                                            ^
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:487:76: warning: ‘cdecl’ attribute ignored [-Wattributes]
  487 |     EXPORT void CDECL destroy_dense_entry_stream(Stream<DenseEntry>* stream)
      |                                                                            ^
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:492:78: warning: ‘cdecl’ attribute ignored [-Wattributes]
  492 |     EXPORT void CDECL destroy_sparse_entry_stream(Stream<SparseEntry>* stream)
      |                                                                              ^
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:497:76: warning: ‘cdecl’ attribute ignored [-Wattributes]
  497 |     EXPORT void CDECL destroy_dense_batch_stream(Stream<DenseBatch>* stream)
      |                                                                            ^
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:502:78: warning: ‘cdecl’ attribute ignored [-Wattributes]
  502 |     EXPORT void CDECL destroy_sparse_batch_stream(Stream<SparseBatch>* stream)
      |                                                                              ^
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:507:79: warning: ‘cdecl’ attribute ignored [-Wattributes]
  507 |     EXPORT DenseEntry* CDECL fetch_next_dense_entry(Stream<DenseEntry>* stream)
      |                                                                               ^
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:512:82: warning: ‘cdecl’ attribute ignored [-Wattributes]
  512 |     EXPORT SparseEntry* CDECL fetch_next_sparse_entry(Stream<SparseEntry>* stream)
      |                                                                                  ^
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:517:79: warning: ‘cdecl’ attribute ignored [-Wattributes]
  517 |     EXPORT DenseBatch* CDECL fetch_next_dense_batch(Stream<DenseBatch>* stream)
      |                                                                               ^
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:522:82: warning: ‘cdecl’ attribute ignored [-Wattributes]
  522 |     EXPORT SparseBatch* CDECL fetch_next_sparse_batch(Stream<SparseBatch>* stream)
      |                                                                                  ^
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:527:56: warning: ‘cdecl’ attribute ignored [-Wattributes]
  527 |     EXPORT void CDECL destroy_dense_entry(DenseEntry* e)
      |                                                        ^
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:532:58: warning: ‘cdecl’ attribute ignored [-Wattributes]
  532 |     EXPORT void CDECL destroy_sparse_entry(SparseEntry* e)
      |                                                          ^
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:537:56: warning: ‘cdecl’ attribute ignored [-Wattributes]
  537 |     EXPORT void CDECL destroy_dense_batch(DenseBatch* e)
      |                                                        ^
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:542:58: warning: ‘cdecl’ attribute ignored [-Wattributes]
  542 |     EXPORT void CDECL destroy_sparse_batch(SparseBatch* e)
      |                                                          ^
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp: In instantiation of ‘FeaturedEntryStream<FeatureSetT, StorageT>::FeaturedEntryStream(const char*, bool) [with FeatureSetT = FeatureSet<HalfKP>; StorageT = DenseEntry]’:
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:457:16:   required from ‘StreamT<FeatureSet<HalfKP>, StorageT>* create_stream(std::string, ArgsTs&& ...) [with StreamT = FeaturedEntryStream; StorageT = DenseEntry; ArgsTs = {const char*&, bool&}; std::string = std::__cxx11::basic_string<char>]’
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:469:92:   required from here
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:378:32: error: no matching function for call to ‘Stream<DenseEntry>::Stream()’
  378 |         Stream(filename, cyclic)
      |                                ^
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:339:5: note: candidate: ‘Stream<StorageT>::Stream(const char*, bool) [with StorageT = DenseEntry]’
  339 |     Stream(const char* filename, bool cyclic) :
      |     ^~~~~~
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:339:5: note:   candidate expects 2 arguments, 0 provided
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:335:8: note: candidate: ‘Stream<DenseEntry>::Stream(Stream<DenseEntry>&&)’
  335 | struct Stream : AnyStream
      |        ^~~~~~
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:335:8: note:   candidate expects 1 argument, 0 provided
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp: In instantiation of ‘FeaturedEntryStream<FeatureSetT, StorageT>::FeaturedEntryStream(const char*, bool) [with FeatureSetT = FeatureSet<HalfKP>; StorageT = SparseEntry]’:
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:457:16:   required from ‘StreamT<FeatureSet<HalfKP>, StorageT>* create_stream(std::string, ArgsTs&& ...) [with StreamT = FeaturedEntryStream; StorageT = SparseEntry; ArgsTs = {const char*&, bool&}; std::string = std::__cxx11::basic_string<char>]’
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:474:93:   required from here
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:378:32: error: no matching function for call to ‘Stream<SparseEntry>::Stream()’
  378 |         Stream(filename, cyclic)
      |                                ^
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:339:5: note: candidate: ‘Stream<StorageT>::Stream(const char*, bool) [with StorageT = SparseEntry]’
  339 |     Stream(const char* filename, bool cyclic) :
      |     ^~~~~~
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:339:5: note:   candidate expects 2 arguments, 0 provided
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:335:8: note: candidate: ‘Stream<SparseEntry>::Stream(Stream<SparseEntry>&&)’
  335 | struct Stream : AnyStream
      |        ^~~~~~
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:335:8: note:   candidate expects 1 argument, 0 provided
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp: In instantiation of ‘FeaturedBatchStream<FeatureSetT, StorageT>::FeaturedBatchStream(const char*, int, bool) [with FeatureSetT = FeatureSet<HalfKP>; StorageT = DenseBatch]’:
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:457:16:   required from ‘StreamT<FeatureSet<HalfKP>, StorageT>* create_stream(std::string, ArgsTs&& ...) [with StreamT = FeaturedBatchStream; StorageT = DenseBatch; ArgsTs = {const char*&, int&, bool&}; std::string = std::__cxx11::basic_string<char>]’
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:479:104:   required from here
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:405:32: error: no matching function for call to ‘AsyncStream<DenseBatch>::AsyncStream()’
  405 |         m_batch_size(batch_size)
      |                                ^
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:353:5: note: candidate: ‘AsyncStream<StorageT>::AsyncStream(const char*, bool) [with StorageT = DenseBatch]’
  353 |     AsyncStream(const char* filename, bool cyclic) :
      |     ^~~~~~~~~~~
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:353:5: note:   candidate expects 2 arguments, 0 provided
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp: In instantiation of ‘FeaturedBatchStream<FeatureSetT, StorageT>::FeaturedBatchStream(const char*, int, bool) [with FeatureSetT = FeatureSet<HalfKP>; StorageT = SparseBatch]’:
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:457:16:   required from ‘StreamT<FeatureSet<HalfKP>, StorageT>* create_stream(std::string, ArgsTs&& ...) [with StreamT = FeaturedBatchStream; StorageT = SparseBatch; ArgsTs = {const char*&, int&, bool&}; std::string = std::__cxx11::basic_string<char>]’
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:484:105:   required from here
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:405:32: error: no matching function for call to ‘AsyncStream<SparseBatch>::AsyncStream()’
  405 |         m_batch_size(batch_size)
      |                                ^
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:353:5: note: candidate: ‘AsyncStream<StorageT>::AsyncStream(const char*, bool) [with StorageT = SparseBatch]’
  353 |     AsyncStream(const char* filename, bool cyclic) :
      |     ^~~~~~~~~~~
/home/vondele/chess/nnue-pytorch/training_data_loader.cpp:353:5: note:   candidate expects 2 arguments, 0 provided
make[2]: *** [CMakeFiles/training_data_loader.dir/build.make:63: CMakeFiles/training_data_loader.dir/training_data_loader.cpp.o] Error 1
make[2]: Leaving directory '/home/vondele/chess/nnue-pytorch'
make[1]: *** [CMakeFiles/Makefile2:76: CMakeFiles/training_data_loader.dir/all] Error 2
make[1]: Leaving directory '/home/vondele/chess/nnue-pytorch'
make: *** [Makefile:130: all] Error 2

@Sopel97
Copy link
Member Author

Sopel97 commented Nov 8, 2020

Okay, seems like gcc is unable to resolve some dependent names. I fixed it.

@vondele
Copy link
Member

vondele commented Nov 8, 2020

yes, compiles now, but doesn't run:

$ python3 train.py 
Cannot find data_loader shared library.

The lib is named : libtraining_data_loader.so

@Sopel97
Copy link
Member Author

Sopel97 commented Nov 8, 2020

Okay, fixed now.

@vondele
Copy link
Member

vondele commented Nov 8, 2020

OK, runs... Epoch 0: : 888it [00:16, 53.49it/s, loss=0.009, v_num=28]

@Sopel97
Copy link
Member Author

Sopel97 commented Nov 8, 2020

Can you try experiment a bit with the batch size and see what is the best for you? For me 8192 seems optimal, maybe twice that is slightly faster, more I run out of memory.

@vondele
Copy link
Member

vondele commented Nov 8, 2020

8192 -> 53it/s
2*8192 -> 36.46it/s
4*8192 -> 19.47it/s
8*8192 ->  9.40it/s

@vondele
Copy link
Member

vondele commented Nov 8, 2020

This will of course depend a bit on the GPU. I wonder what the bottleneck will be on a real GPU server (say 4xV100 or similar).

@Sopel97
Copy link
Member Author

Sopel97 commented Nov 8, 2020

So looks like 4*8192 works best on your system. Thanks for testing this.

@vondele
Copy link
Member

vondele commented Nov 8, 2020

That's also about 87% GPU-Util so probably limited by GPU (RTX 2070 super), in this case.

model.py Outdated Show resolved Hide resolved
@@ -1,132 +0,0 @@
import chess
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we leave this one around for environments we can't compile the c++ version?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively we could ship binaries...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Know what, I'll bring back the old data loader tomorrow, make it sparse, just so it's there, because you have a point. I think it's a matter for another PR though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that's fine! Let's get this rolling, it's a gigantic improvement :).

@glinscott
Copy link
Collaborator

This is some incredible work, thank you so much! Just a few small comments.

Co-authored-by: Gary Linscott <glinscott@gmail.com>
@glinscott glinscott merged commit f50106b into official-stockfish:master Nov 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants