some code missing? #3

cswhjiang · 2017-05-12T12:47:27Z

python scripts/prepro_labels.py --input_json .../dataset_coco.json --output_json data/cocotalk.json --output_h5 data/cocotalk failed. Here are the errors:

Traceback (most recent call last):
  File "scripts/prepro_labels.py", line 192, in <module>
    main(params)
  File "scripts/prepro_labels.py", line 138, in main
    imgs = imgs['images']
TypeError: list indices must be integers, not str

It seems that some code is missing.

The text was updated successfully, but these errors were encountered:

ruotianluo · 2017-05-12T14:17:40Z

Did you change --input_json .../dataset_coco.json to your own path?

cswhjiang · 2017-05-12T15:19:35Z

Thanks. I found the reason just now.

brisker · 2017-09-04T04:18:39Z

@ruotianluo
for the "show_attend_tell" model, where is the code that visualize the visual attention on the input image?

ruotianluo · 2017-09-04T04:30:39Z

Sadly no, because I feel like it will be a little bit messy to add it.
But in principle, you can always save the alphas as a member variable, and visualize it using the code in arctic-caption.

brisker · 2017-09-05T10:38:13Z

@ruotianluo
I do not quite understand what you mean...
what is "alphas"?
what is ‘arctic-caption“?
Besides, could you please provide any demos for the attention visualization for show_attend_tell model?
(if convenient:) ) thx a lot

ruotianluo · 2017-09-05T16:09:47Z

Sorry, what I means alpha is the attention map which is named as weight in my code.
alphas: https://github.com/ruotianluo/neuraltalk2.pytorch/blob/master/models/CaptionModel.py#L269
arctic-caption: https://github.com/kelvinxu/arctic-captions/blob/master/alpha_visualization.ipynb

You can save the weights at each timestep, and visualize it using the last block of the alpha_visualization.ipynb

brisker · 2017-09-06T05:14:18Z

@ruotianluo
why using

self.rnn = getattr(nn, self.rnn_type.upper())(self.input_encoding_size + self.att_feat_size, 
                self.rnn_size, self.num_layers, bias=False, dropout=self.drop_prob_lm)

here?
why not directly using nn.LSTM or nn.GRU?

ruotianluo · 2017-09-06T05:15:32Z

@brisker In principle it allows you to use GRU instead of LSTM, however I forgot if I tested or not.

brisker · 2017-09-06T05:21:52Z

@ruotianluo
?？？ GRU or LSTM, it is decided by opt.rnn_type, right?
why bother to write something like

self.rnn = getattr(nn, self.rnn_type.upper())(self.input_encoding_size + self.att_feat_size, 
                self.rnn_size, self.num_layers, bias=False, dropout=self.drop_prob_lm)

ruotianluo · 2017-09-06T05:24:35Z

@brisker Why not?

brisker · 2017-09-06T12:49:26Z

@ruotianluo
new to image caption..
two questions

what does this variable ss_prob mean?
what does this variable masks mean?

ruotianluo · 2017-09-06T14:30:48Z

schedule sampling
masks indicate how long each caption is.

brisker · 2017-09-06T15:07:49Z

@ruotianluo
the attention map which is named as weight in your code:
alphas: https://github.com/ruotianluo/neuraltalk2.pytorch/blob/master/models/CaptionModel.py#L269
did you mean that by just resizing the "weight" variable to the size of the input image, and then we can get the attention that is ready to be added to the input image for visualization?

ruotianluo · 2017-09-06T15:17:42Z

@brisker yes. Note that weight is flattened, you should first resize to 7x7. (I forgot to mention, this show_attend_tell is not exactly the same as described in the paper, it's simplified a little bit.)

brisker · 2017-09-06T15:33:33Z

@ruotianluo
I am new to image caption and do not quite understand how the model works...
could you please specify a little bit about where is the simplification in your code for the show_attend_tell model , comparing to the original paper?
here ?

att_feats = cnn_model(images).permute(0, 2, 3, 1)
fc_feats = att_feats.mean(2).mean(1)

you seems to just perform average pooling on the conv features as the fc_feats

ruotianluo · 2017-09-06T15:38:03Z

@brisker This is because I'm using resnet.
The network details are different, but the main difference is I didn't add the doubly stochastic attention in the paper.

brisker · 2017-09-07T07:43:48Z

@ruotianluo
do you mean that by setting schedule sampling probability to a value larger than 0.0, then the model is a Stochastic “Hard” Attention model depicted in the show_attend_tell paper?

ruotianluo · 2017-09-07T07:45:22Z

No, schedule sampling is another thing which is not mentioned in the show attend tell paper; you can google the schedule sampling paper;
I forgot to mention, hard attention is also one thing I didn't implement here.

brisker · 2017-09-07T07:53:35Z

@ruotianluo

did you mean that the schedule sampling paper is only corresponding to
these codes?

            if self.training and i >= 1 and self.ss_prob > 0.0: # otherwiste no need to sample
                sample_prob = fc_feats.data.new(batch_size).uniform_(0, 1)

                #print("sample_prob:")
                #print(sample_prob.size())
                sample_mask = sample_prob < self.ss_prob
                if sample_mask.sum() == 0:
                    it = seq[:, i].clone()
                else:
                    sample_ind = sample_mask.nonzero().view(-1)
                    it = seq[:, i].data.clone()
                    #prob_prev = torch.exp(outputs[-1].data.index_select(0, sample_ind)) # fetch prev distribution: shape Nx(M+1)
                    #it.index_copy_(0, sample_ind, torch.multinomial(prob_prev, 1).view(-1))
                    prob_prev = torch.exp(outputs[-1].data) # fetch prev distribution: shape Nx(M+1)
                    it.index_copy_(0, sample_ind, torch.multinomial(prob_prev, 1).view(-1).index_select(0, sample_ind))
                    it = Variable(it, requires_grad=False)

what is the benefits of schedule sampling?

ruotianluo · 2017-09-07T07:56:32Z

Yes, it's replacing the input of network with sampled output by chance.

It's designed to solve the problem of test training discrepancy.
In practice, its effect depends on different model, like FCModel doesn't need scheduled sampling but ShowTell will perform better with scheduled sampling.

brisker · 2017-09-28T11:55:53Z

@ruotianluo
so weird and embarrassed to ask, but when I am doing inference using the same ShowTellModel model and the same image, why the inference results are different from each other?( I modified the ShowTellModel a little bit:
at time step 0 ,I feed the LSTM with the fc_feats, with an image embedding layer, and time step 1 feed the start token)
Any idea on why this happens?

ruotianluo · 2017-09-28T14:23:21Z

Did you set model to evaluate?

brisker · 2017-09-28T14:49:03Z

@ruotianluo
yes

ruotianluo · 2017-09-28T15:15:27Z

How different are the results

brisker · 2017-09-29T12:44:34Z

@ruotianluo
technically, in an attention model, it is not a must to concat the
the "att_feats" (output of cnn) and the LSTM hidden states, to feed as the input of LSTM , right?

ruotianluo · 2017-09-29T13:39:31Z

Yes, as long as its mathematically equivalent.

brisker · 2017-09-29T13:57:21Z

@ruotianluo
so what is the key idea of attention model? dynamically using the fc_feats and the LSTM hidden states to compute a weight tensor, and visualize this weight tensor on the input image?

ruotianluo · 2017-09-29T14:29:56Z

Idea is you can look at different part of the image at each time step.

brisker · 2017-09-29T15:06:09Z

@ruotianluo
so you mean that every time step, we use the fc_feats and the LSTM hidden states to compute the attention "weight" tensor, and every time step, the fc_feats here is different? But actually it is the same, right?(because the cnn model forward have already been completed)------- I am a little puzzled here.. which variable's change leads to "look at different part of the image at each time step"

ruotianluo · 2017-09-29T15:15:37Z

We don't use fc_feats to compute the weight, we use att_feats. hidden state is changing.

brisker · 2017-09-29T15:18:05Z

thanks for all your replies :)
@ruotianluo
so it's the hidden state changes lead to "look at different part of the image at each time step" ? But actually att_feats does not change during the unrolling process of LSTM, right? If we do not concat the att_feats and the hidden states as the input of LSTM, it seems that the attention is only related to the LSTM....is it common , to not concat, in exsiting attention models? pros and cons?

ruotianluo · 2017-09-29T15:24:46Z

Att_feats are changing over locations, hidden states are changing over time. And the output of the attention module is a weighted summation of att_feats

ruotianluo · 2017-09-29T15:26:38Z

Technically I don't concat but mathematical it's equivalent. I wrote in this way to avoid duplicate computation.

brisker · 2017-09-29T15:30:50Z

@ruotianluo
here :-- https://github.com/ruotianluo/neuraltalk2.pytorch/blob/master/models/OldModel.py#L224

 att_feats_ = att_feats.view(-1, att_size, self.att_feat_size) # batch * att_size * att_feat_size
att_res = torch.bmm(weight.unsqueeze(1), att_feats_).squeeze(1) # batch * att_feat_size 
output, state = self.rnn(torch.cat([xt, att_res], 1).unsqueeze(0), state)

xt is output of each time step, and att_res is a weighted sum of att_feats, right?
But if I do not concat xt and att_res (only xt), there is obviously no weighted sum of att_feats, right? If do not concat, only hidden states changing, no weighted sum of att_feats, is this reasonable?

ruotianluo · 2017-09-29T15:39:01Z

Ok, it seems that I misunderstood your question. Yes, if you don't concat att_res here, it's not an attention model, and there's no visualization either, because there's no training signal to the attention module.

brisker · 2017-09-29T15:46:04Z

thanks for all you replies :)
@ruotianluo
so it is a must to combine both the output of LSTM xt and att_feats, in attention model, right?
Besides, is there any other operations to combine this two variables, except concat?

ruotianluo · 2017-09-29T15:47:42Z

There are a lot of different fusion types proposed in VQA literature. You can check it out. One easiest alternative is elementwise product.

ruotianluo closed this as completed May 12, 2017

dhimanpd mentioned this issue Jun 21, 2018

Error in eval #55

Closed

metro-smiles mentioned this issue Sep 29, 2018

Upgrade code to PyTorch 0.4 #69

Closed

some code missing? #3

some code missing? #3

Comments

cswhjiang commented May 12, 2017 • edited

ruotianluo commented May 12, 2017

cswhjiang commented May 12, 2017

brisker commented Sep 4, 2017

ruotianluo commented Sep 4, 2017

brisker commented Sep 5, 2017 • edited

ruotianluo commented Sep 5, 2017

brisker commented Sep 6, 2017 • edited

ruotianluo commented Sep 6, 2017

brisker commented Sep 6, 2017 • edited

ruotianluo commented Sep 6, 2017

brisker commented Sep 6, 2017 • edited

ruotianluo commented Sep 6, 2017

brisker commented Sep 6, 2017

ruotianluo commented Sep 6, 2017

brisker commented Sep 6, 2017 • edited

ruotianluo commented Sep 6, 2017

brisker commented Sep 7, 2017

ruotianluo commented Sep 7, 2017

brisker commented Sep 7, 2017

ruotianluo commented Sep 7, 2017 • edited

brisker commented Sep 28, 2017 • edited

ruotianluo commented Sep 28, 2017

brisker commented Sep 28, 2017

ruotianluo commented Sep 28, 2017

brisker commented Sep 29, 2017 • edited

ruotianluo commented Sep 29, 2017

brisker commented Sep 29, 2017

ruotianluo commented Sep 29, 2017

brisker commented Sep 29, 2017 • edited

ruotianluo commented Sep 29, 2017

brisker commented Sep 29, 2017 • edited

ruotianluo commented Sep 29, 2017

ruotianluo commented Sep 29, 2017

brisker commented Sep 29, 2017 • edited

ruotianluo commented Sep 29, 2017

brisker commented Sep 29, 2017 • edited

ruotianluo commented Sep 29, 2017

cswhjiang commented May 12, 2017 •

edited

brisker commented Sep 5, 2017 •

edited

brisker commented Sep 6, 2017 •

edited

brisker commented Sep 6, 2017 •

edited

brisker commented Sep 6, 2017 •

edited

brisker commented Sep 6, 2017 •

edited

ruotianluo commented Sep 7, 2017 •

edited

brisker commented Sep 28, 2017 •

edited

brisker commented Sep 29, 2017 •

edited

brisker commented Sep 29, 2017 •

edited

brisker commented Sep 29, 2017 •

edited

brisker commented Sep 29, 2017 •

edited

brisker commented Sep 29, 2017 •

edited