resource exhausted error #10

takuseno · 2018-02-20T16:06:21Z

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[32,24487,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: deepq/DND/lookup/Tile = Tile[T=DT_FLOAT, Tmultiples=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](deepq/DND/lookup/Tile/input, deepq/DND/lookup/Tile/multiples)]]

takuseno · 2018-02-20T16:10:11Z

2018-02-21 01:03:53.047199: W tensorflow/core/common_runtime/bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.49GiB. Current allocation summary follows.
2018-02-21 01:03:53.047267: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (256): Total Chunks: 64, Chunks in use: 63. 16.0KiB allocated for chunks. 15.8KiB in use in bin. 3.5KiB client-requested in use in bin.
2018-02-21 01:03:53.047305: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (512): Total Chunks: 1, Chunks in use: 1. 512B allocated for chunks. 512B in use in bin. 384B client-requested in use in bin.
2018-02-21 01:03:53.047327: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (1024): Total Chunks: 1, Chunks in use: 1. 1.2KiB allocated for chunks. 1.2KiB in use in bin. 1.0KiB client-requested in use in bin.
2018-02-21 01:03:53.047347: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (2048): Total Chunks: 5, Chunks in use: 5. 10.0KiB allocated for chunks. 10.0KiB in use in bin. 10.0KiB client-requested in use in bin.
2018-02-21 01:03:53.047366: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (4096): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2018-02-21 01:03:53.047385: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (8192): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2018-02-21 01:03:53.047403: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (16384): Total Chunks: 1, Chunks in use: 0. 22.2KiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2018-02-21 01:03:53.047425: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (32768): Total Chunks: 5, Chunks in use: 5. 160.0KiB allocated for chunks. 160.0KiB in use in bin. 160.0KiB client-requested in use in bin.
2018-02-21 01:03:53.047447: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (65536): Total Chunks: 3, Chunks in use: 2. 288.0KiB allocated for chunks. 192.0KiB in use in bin. 191.7KiB client-requested in use in bin.
2018-02-21 01:03:53.047471: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (131072): Total Chunks: 11, Chunks in use: 11. 1.47MiB allocated for chunks. 1.47MiB in use in bin. 1.42MiB client-requested in use in bin.
2018-02-21 01:03:53.047492: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (262144): Total Chunks: 7, Chunks in use: 7. 2.67MiB allocated for chunks. 2.67MiB in use in bin. 2.67MiB client-requested in use in bin.
2018-02-21 01:03:53.047511: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (524288): Total Chunks: 2, Chunks in use: 2. 1.89MiB allocated for chunks. 1.89MiB in use in bin. 1.89MiB client-requested in use in bin.
2018-02-21 01:03:53.047535: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (1048576): Total Chunks: 1, Chunks in use: 1. 1.72MiB allocated for chunks. 1.72MiB in use in bin. 1.72MiB client-requested in use in bin.
2018-02-21 01:03:53.047556: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (2097152): Total Chunks: 2, Chunks in use: 1. 5.89MiB allocated for chunks. 3.45MiB in use in bin. 3.45MiB client-requested in use in bin.
2018-02-21 01:03:53.047576: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (4194304): Total Chunks: 1, Chunks in use: 1. 5.17MiB allocated for chunks. 5.17MiB in use in bin. 3.12MiB client-requested in use in bin.
2018-02-21 01:03:53.047601: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (8388608): Total Chunks: 5, Chunks in use: 5. 75.62MiB allocated for chunks. 75.62MiB in use in bin. 75.62MiB client-requested in use in bin.
2018-02-21 01:03:53.047620: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (16777216): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2018-02-21 01:03:53.047638: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (33554432): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2018-02-21 01:03:53.047657: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (67108864): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2018-02-21 01:03:53.047679: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (134217728): Total Chunks: 4, Chunks in use: 4. 781.25MiB allocated for chunks. 781.25MiB in use in bin. 781.25MiB client-requested in use in bin.
2018-02-21 01:03:53.047700: I tensorflow/core/common_runtime/bfc_allocator.cc:628] Bin (268435456): Total Chunks: 6, Chunks in use: 5. 8.98GiB allocated for chunks. 7.48GiB in use in bin. 7.48GiB client-requested in use in bin.
2018-02-21 01:03:53.047721: I tensorflow/core/common_runtime/bfc_allocator.cc:644] Bin for 1.49GiB was 256.00MiB, Chunk State:
2018-02-21 01:03:53.047746: I tensorflow/core/common_runtime/bfc_allocator.cc:650] Size: 1.49GiB | Requested Size: 6.2KiB | in_use: 0, prev: Size: 1.51GiB | Requested Size: 1.51GiB | in_use: 1

takuseno · 2018-02-20T16:12:14Z

try running with config.gpu_options.allow_growth = True

smatsumori · 2018-04-03T11:50:41Z

nec/dnd.py

Line 44 in 4f2e3d9

tiled_keys = tf.tile([keys], [tf.shape(h)[0], 1, 1])

hで入ってくるのって (keysize, batchsize)だよね．batchってlast axisに沿って拡張されるよね．
するとh.shape[0]でtileするのっておかしい気がするんだけど

smatsumori · 2018-04-03T12:24:03Z

メモリサイズ考えると
keysize * float32 * batchsize * dndsize * capacity
512 * 32 * 4 * 4 * 5 * 10 ** 5 = 16GB
だから原理的に指数の部分が効いてくる．上手く並列化するのが手っ取り早いのか？

takuseno · 2018-04-03T15:02:48Z

while_loopはloopと言いつつ自動的に並列化してくれるはず

smatsumori · 2018-04-03T15:43:03Z

なるほどね．while_loopは何に対してiterationを掛けるべきか．
keyのindex? or batch?

あと普通にbroadcastingがexplicitにtileしてやるよりメモリを節約できるらしいが，
なぜ我々は後者の書き方をしているのか．
tensorflow/tensorflow#1934
こちらも手っ取り早く試してみる．

got Dst tensor is not initialized.
seems this raises when GPU memory is full.
aymericdamien/TensorFlow-Examples#38

takuseno · 2018-04-04T01:56:04Z

これbroadcastingできるの？

smatsumori · 2018-04-05T05:41:21Z

perhaps．今夜試してみる．
tf.deviceで別々のGPUにDND割り当てたら解決しないかな．

takuseno · 2018-04-05T06:09:20Z

I hope transferring data between GPUs doesn't take long time.

smatsumori · 2018-04-10T03:22:24Z

Converting sparse IndexSlices to a dense Tensor unknown shape. This may consume a large amount of memory.

References

https://stackoverflow.com/questions/35892412/tensorflow-dense-gradient-explanation

Now running with half tile half broadcasted.

check if it works

jlindsey15 · 2018-08-07T17:56:38Z

I'm unable to run the model on any environment but CartPole due to ResourceExhausted errors. Any tips?

smatsumori · 2018-08-11T06:49:57Z

@jlindsey15 Thank you for the comment! If you have multiple gpus, splitting DNDs into each device will solve the issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resource exhausted error #10

resource exhausted error #10

takuseno commented Feb 20, 2018 •

edited

Loading

takuseno commented Feb 20, 2018

takuseno commented Feb 20, 2018

smatsumori commented Apr 3, 2018

smatsumori commented Apr 3, 2018

takuseno commented Apr 3, 2018

smatsumori commented Apr 3, 2018 •

edited

Loading

takuseno commented Apr 4, 2018

smatsumori commented Apr 5, 2018

takuseno commented Apr 5, 2018 •

edited

Loading

smatsumori commented Apr 10, 2018 •

edited

Loading

jlindsey15 commented Aug 7, 2018

smatsumori commented Aug 11, 2018

resource exhausted error #10

resource exhausted error #10

Comments

takuseno commented Feb 20, 2018 • edited Loading

takuseno commented Feb 20, 2018

takuseno commented Feb 20, 2018

smatsumori commented Apr 3, 2018

smatsumori commented Apr 3, 2018

takuseno commented Apr 3, 2018

smatsumori commented Apr 3, 2018 • edited Loading

takuseno commented Apr 4, 2018

smatsumori commented Apr 5, 2018

takuseno commented Apr 5, 2018 • edited Loading

smatsumori commented Apr 10, 2018 • edited Loading

References

jlindsey15 commented Aug 7, 2018

smatsumori commented Aug 11, 2018

takuseno commented Feb 20, 2018 •

edited

Loading

smatsumori commented Apr 3, 2018 •

edited

Loading

takuseno commented Apr 5, 2018 •

edited

Loading

smatsumori commented Apr 10, 2018 •

edited

Loading