-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with training #52
Comments
Try gzip train.0 Parse I think will check for the number and expect the .gz on its own. |
Next problem I have name: GeForce GTX 760 major: 3 minor: 0 memoryClockRate(GHz): 1.0845 and I get this : 2017-11-14 21:29:51.688804: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 227.25MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. 2017-11-14 21:29:51.692699: W tensorflow/stream_executor/cuda/cuda_dnn.cc:2223] 2017-11-14 21:29:51.693036: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 227.25MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available. |
I had to halve the BATCH_SIZE in training/tf/parse.py (set it to 128) to avoid (I think?) that error on my GTX 950. I may have had to lower per_process_gpu_memory_fraction a little, too. |
Thanks it works now! even if 224.638 pos/s I guess is not that good |
You can probably use larger batches if you set it higher, not lower. TensorFlow defaults to using 100% of GPU RAM but this is annoying if you want to run leelaz at the same time, so I changed the default to 75%. If you lower the batch size, you should lower the learning rate in MomentumOptimizer a bit, by a factor of sqrt() the factor you lowered the batch size with. |
@marcocalignano Is this issue ready to close? :) |
I am tring to train the net
marcuzzo@marcuzzo: ~/workspace/leela-zero[next]training/tf/parse.py build/train.0
Found 0 chunks
marcuzzo@marcuzzo: ~/workspace/leela-zero[next]$ training/tf/parse.py build/train.1
Found 0 chunks
marcuzzo@marcuzzo: ~/workspace/leela-zero[next]$ training/tf/parse.py build/train.2
Found 0 chunks
what am I doing wrong?
The text was updated successfully, but these errors were encountered: