Bug in CPU training (related to TensorFlow)? #2

tim5go · 2017-07-24T03:29:08Z

As I observed, there will be an out-of-vocabulary error throwing out when using the "embedding_lookup"
Error looks like:

InvalidArgumentError (see above for traceback): indices[0,1,3] = 6501 is not in [0, 6342)
[[Node: model_1/embedding_lookup = Gather[Tindices=DT_INT64, Tparams=DT_FLOAT, class=["loc:@model/embedding"], validate
indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](model/embedding/read, _recv_model_1/inputs_0)]]

See geek-ai/irgan#9

Nothing wrong with your code, it sounds like a known issue in Tensorflow (CPU version)

indiejoseph · 2017-07-24T03:50:21Z

are you obtain the TFrecord training file via data_helper ? Line 149 used dict.get any OOV indexes will replace with 0(UNK index), so there no way has any index out of specified vocab.pkl.

tim5go · 2017-07-24T04:00:42Z

I used: python train.py
So I assumed it was already using data_helper.
I will try to switch it to a GPU environment to see if the problem still persists.

BTW, may I know the corpus size of your char-rnn model, it seems that it is quite RAM-consuming.
For a 2GB Apple Daily raw text, there will be a spike of 60GB consumption. (although the consumption will go down later)

It will be nicer if I know your original corpus size ><"

tim5go · 2017-07-24T09:05:39Z

After changing the line:
https://github.com/indiejoseph/doc-han-att/blob/master/model.py#L30
from cpu to gpu, the error goes away.

^^

indiejoseph · 2017-07-25T03:07:46Z

i used <1GB Apple Daily news text to train char-rnn LM.
it is so weird that i was successfully training this model with CPU, what is your tensorflow version?

tim5go · 2017-07-25T04:47:41Z

Um...maybe one of the reason is that my vocabulary size is smaller than yours i.e. 6790
I only got 6342 from a 300MB corpus
Also, I am using tensorflow 1.1 in Python 3.

indiejoseph closed this as completed Jul 25, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in CPU training (related to TensorFlow)? #2

Bug in CPU training (related to TensorFlow)? #2

tim5go commented Jul 24, 2017 •

edited

indiejoseph commented Jul 24, 2017

tim5go commented Jul 24, 2017

tim5go commented Jul 24, 2017

indiejoseph commented Jul 25, 2017

tim5go commented Jul 25, 2017

Bug in CPU training (related to TensorFlow)? #2

Bug in CPU training (related to TensorFlow)? #2

Comments

tim5go commented Jul 24, 2017 • edited

indiejoseph commented Jul 24, 2017

tim5go commented Jul 24, 2017

tim5go commented Jul 24, 2017

indiejoseph commented Jul 25, 2017

tim5go commented Jul 25, 2017

tim5go commented Jul 24, 2017 •

edited