some bug found in using #4

hitluobin · 2017-05-19T12:03:10Z

eval.py
line 79: opt.input_fc_h5 = infos['opt'].input_fc_h5 need change to opt.input_fc_dir = infos['opt'].input_fc_dir
line 80: opt.input_att_h5 = infos['opt'].input_att_h5 need change to opt.input_att_dir = infos['opt'].input_att_dir

dataloaderraw.py
line 104: img = img.concatenate((img, img, img), axis=2) need change to img = np.concatenate((img, img, img), axis=2)

ruotianluo · 2017-05-19T16:28:48Z

fixed

brisker · 2017-09-05T07:18:28Z

@ruotianluo
when running eval.py, error:

DataLoaderRaw found  40504  images
Traceback (most recent call last):
  File "eval.py", line 121, in <module>
    vars(opt))
  File "/home/jcc/research/neuraltalk2.pytorch-with_finetune/eval_utils.py", line 112, in eval_split
    seq, _ = model.sample(fc_feats, att_feats, eval_kwargs)
  File "/home/jcc/research/neuraltalk2.pytorch-with_finetune/misc/CaptionModel.py", line 181, in sample
    return self.sample_beam(fc_feats, att_feats, opt)
  File "/home/jcc/research/neuraltalk2.pytorch-with_finetune/misc/CaptionModel.py", line 97, in sample_beam
    state = self.init_hidden(fc_feats[k:k+1].expand(beam_size, self.rnn_size))
  File "/home/jcc/anaconda2/lib/python2.7/site-packages/torch/autograd/variable.py", line 652, in expand
    return Expand(sizes)(self)
  File "/home/jcc/anaconda2/lib/python2.7/site-packages/torch/autograd/_functions/tensor.py", line 115, in forward
    result = i.expand(*self.sizes)
  File "/home/jcc/anaconda2/lib/python2.7/site-packages/torch/tensor.py", line 261, in expand
    raise ValueError('incorrect size: only supporting singleton expansion (size=1)')
ValueError: incorrect size: only supporting singleton expansion (size=1)

ruotianluo · 2017-09-05T07:25:36Z

@brisker should have been fixed. Just notice, most models in CaptionModel file are not very good. If you want to have ideas about the performance of different models, check here #10.

(I dont even benchmark the models in CaptionModel.)

ruotianluo · 2017-09-05T07:35:00Z

@brisker now?

brisker · 2017-09-05T07:38:19Z

@ruotianluo

DataLoaderRaw found  40504  images
Traceback (most recent call last):
  File "eval.py", line 121, in <module>
    vars(opt))
  File "/home/jcc/research/neuraltalk2.pytorch-with_finetune/eval_utils.py", line 112, in eval_split
    seq, _ = model.sample(fc_feats, att_feats, eval_kwargs)
  File "/home/jcc/research/neuraltalk2.pytorch-with_finetune/misc/CaptionModel.py", line 186, in sample
    return self.sample_beam(fc_feats, att_feats, opt)
  File "/home/jcc/research/neuraltalk2.pytorch-with_finetune/misc/CaptionModel.py", line 172, in sample_beam
    output, state = self.core(xt, tmp_fc_feats, tmp_att_feats, state)
  File "/home/jcc/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/jcc/research/neuraltalk2.pytorch-with_finetune/misc/CaptionModel.py", line 256, in forward
    att = att_feats.view(-1, self.att_feat_size)
  File "/home/jcc/anaconda2/lib/python2.7/site-packages/torch/autograd/variable.py", line 471, in view
    return View(*sizes)(self)
  File "/home/jcc/anaconda2/lib/python2.7/site-packages/torch/autograd/_functions/tensor.py", line 98, in forward
    result = i.view(*self.sizes)
RuntimeError: input is not contiguous at /home/jcc/pytorch/torch/lib/THC/generic/THCTensor.c:228

ruotianluo · 2017-09-05T07:42:12Z

@brisker Any more bugs?

brisker · 2017-09-05T07:44:11Z

@ruotianluo

DataLoaderRaw found  40504  images
Traceback (most recent call last):
  File "eval.py", line 121, in <module>
    vars(opt))
  File "/home/jcc/research/neuraltalk2.pytorch-with_finetune/eval_utils.py", line 112, in eval_split
    seq, _ = model.sample(fc_feats, att_feats, eval_kwargs)
  File "/home/jcc/research/neuraltalk2.pytorch-with_finetune/misc/CaptionModel.py", line 186, in sample
    return self.sample_beam(fc_feats, att_feats, opt)
  File "/home/jcc/research/neuraltalk2.pytorch-with_finetune/misc/CaptionModel.py", line 139, in sample_beam
    beam_seq_logprobs_prev = beam_seq_logprobs[:t - 2].clone()
ValueError: result of slicing is an empty tensor

brisker · 2017-09-05T07:47:41Z

@ruotianluo
ps: I am using the with_finetune branch

ruotianluo · 2017-09-05T08:10:58Z

@brisker try t-2 to t-1?

I was trying to run my own machine, but it's not working right now due to some wierd reason.

brisker · 2017-09-05T08:17:35Z

what is "t-2 to t-1"?
@ruotianluo

ruotianluo · 2017-09-05T08:23:57Z

@brisker pushed to with_finetune branch.

brisker · 2017-09-05T09:55:29Z

@ruotianluo
error using with_finetune branch eval.py :

DataLoaderRaw found  40504  images
Traceback (most recent call last):
  File "eval.py", line 121, in <module>
    vars(opt))
  File "/home/jcc/research/neuraltalk2.pytorch/eval_utils.py", line 112, in eval_split
    seq, _ = model.sample(fc_feats, att_feats, eval_kwargs)
  File "/home/jcc/research/neuraltalk2.pytorch/models/CaptionModel.py", line 182, in sample
    return self.sample_beam(fc_feats, att_feats, opt)
  File "/home/jcc/research/neuraltalk2.pytorch/models/CaptionModel.py", line 98, in sample_beam
    state = self.init_hidden(fc_feats[k:k+1].expand(beam_size, self.rnn_size))
  File "/home/jcc/anaconda2/lib/python2.7/site-packages/torch/autograd/variable.py", line 652, in expand
    return Expand(sizes)(self)
  File "/home/jcc/anaconda2/lib/python2.7/site-packages/torch/autograd/_functions/tensor.py", line 115, in forward
    result = i.expand(*self.sizes)
  File "/home/jcc/anaconda2/lib/python2.7/site-packages/torch/tensor.py", line 245, in expand
    raise ValueError('the number of dimensions provided must be greater or equal tensor.dim()')
ValueError: the number of dimensions provided must be greater or equal tensor.dim()

by changing beam_size from 2 to 1 in eval.py, bug fix.

brisker · 2017-09-05T10:06:02Z

Besides, by changing
image_map = self.linear(fc_feats).view(-1, self.num_layers, self.rnn_size).transpose(0, 1)
here
to
image_map = self.linear(fc_feats.squeeze()).view(-1, self.num_layers, self.rnn_size).transpose(0, 1)
fix bugs when running train.py (no other fix for with_finetune branch)

the bug reads like

Traceback (most recent call last):
 File "train.py", line 254, in <module>
   train(opt)
 File "train.py", line 152, in train
   loss = crit(model(fc_feats, att_feats, labels), labels[:,1:], masks[:,1:])
 File "/home/jcc/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 206, in __call__
   result = self.forward(*input, **kwargs)
 File "/home/jcc/research/neuraltalk2.pytorch/models/CaptionModel.py", line 56, in forward
   state = self.init_hidden(fc_feats)
 File "/home/jcc/research/neuraltalk2.pytorch/models/CaptionModel.py", line 48, in init_hidden
   image_map = self.linear(fc_feats).view(-1, self.num_layers, self.rnn_size).transpose(0, 1)
 File "/home/jcc/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 206, in __call__
   result = self.forward(*input, **kwargs)
 File "/home/jcc/anaconda2/lib/python2.7/site-packages/torch/nn/modules/linear.py", line 54, in forward
   return self._backend.Linear()(input, self.weight, self.bias)
 File "/home/jcc/anaconda2/lib/python2.7/site-packages/torch/nn/_functions/linear.py", line 10, in forward
   output.addmm_(0, 1, input, weight.t())
RuntimeError: matrix and matrix expected at /home/jcc/pytorch/torch/lib/THC/generic/THCTensorMathBlas.cu:237

@ruotianluo

SJTUzhanglj · 2017-09-05T12:46:31Z

here
the feature should be squeezed first from 4d to 2d for the linear operation.

ruotianluo · 2017-09-05T16:04:46Z

@brisker I can't reproduce this. What's the size of your fc_feats. mine is a 2d tensor. (btw, from the error trace, you are not using the latest version.)

brisker · 2017-09-05T16:06:16Z

@ruotianluo
following @SJTUzhanglj and your latest versioni code, the bug has been fixed.

brisker · 2017-09-22T08:28:44Z

@ruotianluo
in each label annotation, why there exists a 0 ahead of the start token ?
I mean that here : https://github.com/ruotianluo/neuraltalk2.pytorch/blob/master/dataloader.py#L138
the label_batch is defined as :
label_batch[i * seq_per_img : (i + 1) * seq_per_img, 1 : self.seq_length + 1] = seq ,
but why not
label_batch[i * seq_per_img : (i + 1) * seq_per_img, 0 : self.seq_length + 1] = seq
??
Is this a start token?

ruotianluo · 2017-09-22T13:52:13Z

0 is start token. (during prediction it's end token)

brisker · 2017-09-22T16:20:30Z

@ruotianluo
so what is the start token during prediction?
and
what is the end token during training?
all 0?

ruotianluo · 2017-09-22T16:23:15Z

Sorry, it may not clear.
When you predict the next token, 0 means end token. When you input 0 into the lstm, it means start token.
(This is because when you predict the next token, you will never predict start token, and you will never input end token to lstm.)

brisker · 2017-09-22T16:28:27Z

@ruotianluo
two questions

what if replace 0 with 1? nothing different?
what if using 0 as start token , and 1 (or some other number ) as end token? nothing different?

ruotianluo · 2017-09-22T16:32:51Z

1 is another token.
In principle you could always have two indices for start token and end token.

I basically follow what neuraltalk2 did, and this looks good to me, so I would keep what it is.

Fix checkpoint

ruotianluo closed this as completed May 19, 2017

dmitriy-serdyuk pushed a commit to dmitriy-serdyuk/neuraltalk2.pytorch that referenced this issue Apr 17, 2018

Merge pull request ruotianluo#4 from dmitriy-serdyuk/fix-checkpoint

c251be8

Fix checkpoint

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

some bug found in using #4

some bug found in using #4

hitluobin commented May 19, 2017

ruotianluo commented May 19, 2017

brisker commented Sep 5, 2017

ruotianluo commented Sep 5, 2017

ruotianluo commented Sep 5, 2017

brisker commented Sep 5, 2017

ruotianluo commented Sep 5, 2017 •

edited

brisker commented Sep 5, 2017

brisker commented Sep 5, 2017 •

edited

ruotianluo commented Sep 5, 2017

brisker commented Sep 5, 2017 •

edited

ruotianluo commented Sep 5, 2017

brisker commented Sep 5, 2017 •

edited

brisker commented Sep 5, 2017 •

edited

SJTUzhanglj commented Sep 5, 2017

ruotianluo commented Sep 5, 2017

brisker commented Sep 5, 2017

brisker commented Sep 22, 2017 •

edited

ruotianluo commented Sep 22, 2017

brisker commented Sep 22, 2017 •

edited

ruotianluo commented Sep 22, 2017

brisker commented Sep 22, 2017

ruotianluo commented Sep 22, 2017

some bug found in using #4

some bug found in using #4

Comments

hitluobin commented May 19, 2017

ruotianluo commented May 19, 2017

brisker commented Sep 5, 2017

ruotianluo commented Sep 5, 2017

ruotianluo commented Sep 5, 2017

brisker commented Sep 5, 2017

ruotianluo commented Sep 5, 2017 • edited

brisker commented Sep 5, 2017

brisker commented Sep 5, 2017 • edited

ruotianluo commented Sep 5, 2017

brisker commented Sep 5, 2017 • edited

ruotianluo commented Sep 5, 2017

brisker commented Sep 5, 2017 • edited

brisker commented Sep 5, 2017 • edited

SJTUzhanglj commented Sep 5, 2017

ruotianluo commented Sep 5, 2017

brisker commented Sep 5, 2017

brisker commented Sep 22, 2017 • edited

ruotianluo commented Sep 22, 2017

brisker commented Sep 22, 2017 • edited

ruotianluo commented Sep 22, 2017

brisker commented Sep 22, 2017

ruotianluo commented Sep 22, 2017

ruotianluo commented Sep 5, 2017 •

edited

brisker commented Sep 5, 2017 •

edited

brisker commented Sep 5, 2017 •

edited

brisker commented Sep 5, 2017 •

edited

brisker commented Sep 5, 2017 •

edited

brisker commented Sep 22, 2017 •

edited

brisker commented Sep 22, 2017 •

edited