Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evaluation for image retrieval does not work. numpy array size mismatch #36

Open
adalisan opened this issue Nov 20, 2019 · 2 comments
Open

Comments

@adalisan
Copy link

I get the the tensor size error at the end. The command I am running is this:
python eval_retrieval.py --bert_model bert-base-uncased --from_pretrained save/RetrievalFlickr30k_bert_base_6layer_6conect-pretrained/pytorch_model_19.bin --config_file config/bert_base_6layer_6conect.json --task 3 --split test --batch_size 1

I get the same error if I try the zeroshot also

python3 ./eval_retrieval.py --bert_model bert-base-uncased --from_pretrained save/bert_base_6_layer_6_connect/pytorch_model_9.bin --config_file config/bert_base_6layer_6conect.json --task 3 --split test --batch_size 1 --zero_shot


11/20/2019 16:54:03 - INFO - vilbert.vilbert -   Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex .
11/20/2019 16:54:03 - INFO - vilbert.basebert -   Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex .
11/20/2019 16:54:03 - INFO - __main__ -   device: cuda n_gpu: 2, distributed training: False, 16-bits training: False
11/20/2019 16:54:03 - INFO - pytorch_pretrained_bert.tokenization -   loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at /home/sadali/.pytorch_pretrained_bert/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
11/20/2019 16:54:03 - INFO - vilbert.task_utils -   Loading RetrievalFlickr30k Dataset with batch size 1
11/20/2019 16:54:07 - INFO - vilbert.vilbert -   loading archive file save/RetrievalFlickr30k_bert_base_6layer_6conect-pretrained/pytorch_model_19.bin
11/20/2019 16:54:07 - INFO - vilbert.vilbert -   Model config {
  "attention_probs_dropout_prob": 0.1,
  "bi_attention_type": 1,
  "bi_hidden_size": 1024,
  "bi_intermediate_size": 1024,
  "bi_num_attention_heads": 8,
  "fast_mode": true,
  "fixed_t_layer": 0,
  "fixed_v_layer": 0,
  "fusion_method": "mul",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "in_batch_pairs": false,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "intra_gate": false,
  "max_position_embeddings": 512,
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pooling_method": "mul",
  "predict_feature": false,
  "t_biattention_id": [
    6,
    7,
    8,
    9,
    10,
    11
  ],
  "type_vocab_size": 2,
  "v_attention_probs_dropout_prob": 0.1,
  "v_biattention_id": [
    0,
    1,
    2,
    3,
    4,
    5
  ],
  "v_feature_size": 2048,
  "v_hidden_act": "gelu",
  "v_hidden_dropout_prob": 0.1,
  "v_hidden_size": 1024,
  "v_initializer_range": 0.02,
  "v_intermediate_size": 1024,
  "v_num_attention_heads": 8,
  "v_num_hidden_layers": 6,
  "v_target_size": 1601,
  "vocab_size": 30522,
  "with_coattention": true
}

  Num Iters:  {'TASK3': 10000}
  Batch size:  {'TASK3': 1}
Traceback (most recent call last):
  File "eval_retrieval.py", line 275, in <module>
    main()
  File "eval_retrieval.py", line 235, in main
    score_matrix[caption_idx, image_idx*500:(image_idx+1)*500] = vil_logit.view(-1).cpu().numpy()
ValueError: could not broadcast input array from shape (250) into shape (500)

@ChanningPing
Copy link

I'm having similar issue:
`06/29/2020 20:52:32 - INFO - main - device: cuda n_gpu: 4, distributed training: False, 16-bits training: False
06/29/2020 20:52:32 - INFO - pytorch_pretrained_bert.tokenization - loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at /home/ubuntu/.pytorch_pretrained_bert/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
06/29/2020 20:52:33 - INFO - vilbert.task_utils - Loading RetrievalFlickr30k Dataset with batch size 1
06/29/2020 20:52:37 - INFO - vilbert.vilbert - loading archive file save/bert_base_6_layer_6_connect/pytorch_model_9.bin
06/29/2020 20:52:37 - INFO - vilbert.vilbert - Model config {
"attention_probs_dropout_prob": 0.1,
"bi_attention_type": 1,
"bi_hidden_size": 1024,
"bi_intermediate_size": 1024,
"bi_num_attention_heads": 8,
"fast_mode": true,
"fixed_t_layer": 0,
"fixed_v_layer": 0,
"fusion_method": "mul",
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"in_batch_pairs": false,
"initializer_range": 0.02,
"intermediate_size": 3072,
"intra_gate": false,
"max_position_embeddings": 512,
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pooling_method": "mul",
"predict_feature": false,
"t_biattention_id": [
6,
7,
8,
9,
10,
11
],
"type_vocab_size": 2,
"v_attention_probs_dropout_prob": 0.1,
"v_biattention_id": [
0,
1,
2,
3,
4,
5
],
"v_feature_size": 2048,
"v_hidden_act": "gelu",
"v_hidden_dropout_prob": 0.1,
"v_hidden_size": 1024,
"v_initializer_range": 0.02,
"v_intermediate_size": 1024,
"v_num_attention_heads": 8,
"v_num_hidden_layers": 6,
"v_target_size": 1601,
"vocab_size": 30522,
"with_coattention": true
}

model's option for predict_feature is False
06/29/2020 20:52:44 - INFO - vilbert.vilbert - Weights from pretrained model not used in BertForMultiModalPreTraining: ['bert.encoder.v_layer.6.attention.self.query.weight', 'bert.encoder.v_layer.6.attention.self.query.bias', 'bert.encoder.v_layer.6.attention.self.key.weight', 'bert.encoder.v_layer.6.attention.self.key.bias', 'bert.encoder.v_layer.6.attention.self.value.weight', 'bert.encoder.v_layer.6.attention.self.value.bias', 'bert.encoder.v_layer.6.attention.output.dense.weight', 'bert.encoder.v_layer.6.attention.output.dense.bias', 'bert.encoder.v_layer.6.attention.output.LayerNorm.weight', 'bert.encoder.v_layer.6.attention.output.LayerNorm.bias', 'bert.encoder.v_layer.6.intermediate.dense.weight', 'bert.encoder.v_layer.6.intermediate.dense.bias', 'bert.encoder.v_layer.6.output.dense.weight', 'bert.encoder.v_layer.6.output.dense.bias', 'bert.encoder.v_layer.6.output.LayerNorm.weight', 'bert.encoder.v_layer.6.output.LayerNorm.bias', 'bert.encoder.v_layer.7.attention.self.query.weight', 'bert.encoder.v_layer.7.attention.self.query.bias', 'bert.encoder.v_layer.7.attention.self.key.weight', 'bert.encoder.v_layer.7.attention.self.key.bias', 'bert.encoder.v_layer.7.attention.self.value.weight', 'bert.encoder.v_layer.7.attention.self.value.bias', 'bert.encoder.v_layer.7.attention.output.dense.weight', 'bert.encoder.v_layer.7.attention.output.dense.bias', 'bert.encoder.v_layer.7.attention.output.LayerNorm.weight', 'bert.encoder.v_layer.7.attention.output.LayerNorm.bias', 'bert.encoder.v_layer.7.intermediate.dense.weight', 'bert.encoder.v_layer.7.intermediate.dense.bias', 'bert.encoder.v_layer.7.output.dense.weight', 'bert.encoder.v_layer.7.output.dense.bias', 'bert.encoder.v_layer.7.output.LayerNorm.weight', 'bert.encoder.v_layer.7.output.LayerNorm.bias', 'bert.encoder.c_layer.6.biattention.query1.weight', 'bert.encoder.c_layer.6.biattention.query1.bias', 'bert.encoder.c_layer.6.biattention.key1.weight', 'bert.encoder.c_layer.6.biattention.key1.bias', 'bert.encoder.c_layer.6.biattention.value1.weight', 'bert.encoder.c_layer.6.biattention.value1.bias', 'bert.encoder.c_layer.6.biattention.query2.weight', 'bert.encoder.c_layer.6.biattention.query2.bias', 'bert.encoder.c_layer.6.biattention.key2.weight', 'bert.encoder.c_layer.6.biattention.key2.bias', 'bert.encoder.c_layer.6.biattention.value2.weight', 'bert.encoder.c_layer.6.biattention.value2.bias', 'bert.encoder.c_layer.6.biOutput.dense1.weight', 'bert.encoder.c_layer.6.biOutput.dense1.bias', 'bert.encoder.c_layer.6.biOutput.LayerNorm1.weight', 'bert.encoder.c_layer.6.biOutput.LayerNorm1.bias', 'bert.encoder.c_layer.6.biOutput.q_dense1.weight', 'bert.encoder.c_layer.6.biOutput.q_dense1.bias', 'bert.encoder.c_layer.6.biOutput.dense2.weight', 'bert.encoder.c_layer.6.biOutput.dense2.bias', 'bert.encoder.c_layer.6.biOutput.LayerNorm2.weight', 'bert.encoder.c_layer.6.biOutput.LayerNorm2.bias', 'bert.encoder.c_layer.6.biOutput.q_dense2.weight', 'bert.encoder.c_layer.6.biOutput.q_dense2.bias', 'bert.encoder.c_layer.6.v_intermediate.dense.weight', 'bert.encoder.c_layer.6.v_intermediate.dense.bias', 'bert.encoder.c_layer.6.v_output.dense.weight', 'bert.encoder.c_layer.6.v_output.dense.bias', 'bert.encoder.c_layer.6.v_output.LayerNorm.weight', 'bert.encoder.c_layer.6.v_output.LayerNorm.bias', 'bert.encoder.c_layer.6.t_intermediate.dense.weight', 'bert.encoder.c_layer.6.t_intermediate.dense.bias', 'bert.encoder.c_layer.6.t_output.dense.weight', 'bert.encoder.c_layer.6.t_output.dense.bias', 'bert.encoder.c_layer.6.t_output.LayerNorm.weight', 'bert.encoder.c_layer.6.t_output.LayerNorm.bias', 'bert.encoder.c_layer.7.biattention.query1.weight', 'bert.encoder.c_layer.7.biattention.query1.bias', 'bert.encoder.c_layer.7.biattention.key1.weight', 'bert.encoder.c_layer.7.biattention.key1.bias', 'bert.encoder.c_layer.7.biattention.value1.weight', 'bert.encoder.c_layer.7.biattention.value1.bias', 'bert.encoder.c_layer.7.biattention.query2.weight', 'bert.encoder.c_layer.7.biattention.query2.bias', 'bert.encoder.c_layer.7.biattention.key2.weight', 'bert.encoder.c_layer.7.biattention.key2.bias', 'bert.encoder.c_layer.7.biattention.value2.weight', 'bert.encoder.c_layer.7.biattention.value2.bias', 'bert.encoder.c_layer.7.biOutput.dense1.weight', 'bert.encoder.c_layer.7.biOutput.dense1.bias', 'bert.encoder.c_layer.7.biOutput.LayerNorm1.weight', 'bert.encoder.c_layer.7.biOutput.LayerNorm1.bias', 'bert.encoder.c_layer.7.biOutput.q_dense1.weight', 'bert.encoder.c_layer.7.biOutput.q_dense1.bias', 'bert.encoder.c_layer.7.biOutput.dense2.weight', 'bert.encoder.c_layer.7.biOutput.dense2.bias', 'bert.encoder.c_layer.7.biOutput.LayerNorm2.weight', 'bert.encoder.c_layer.7.biOutput.LayerNorm2.bias', 'bert.encoder.c_layer.7.biOutput.q_dense2.weight', 'bert.encoder.c_layer.7.biOutput.q_dense2.bias', 'bert.encoder.c_layer.7.v_intermediate.dense.weight', 'bert.encoder.c_layer.7.v_intermediate.dense.bias', 'bert.encoder.c_layer.7.v_output.dense.weight', 'bert.encoder.c_layer.7.v_output.dense.bias', 'bert.encoder.c_layer.7.v_output.LayerNorm.weight', 'bert.encoder.c_layer.7.v_output.LayerNorm.bias', 'bert.encoder.c_layer.7.t_intermediate.dense.weight', 'bert.encoder.c_layer.7.t_intermediate.dense.bias', 'bert.encoder.c_layer.7.t_output.dense.weight', 'bert.encoder.c_layer.7.t_output.dense.bias', 'bert.encoder.c_layer.7.t_output.LayerNorm.weight', 'bert.encoder.c_layer.7.t_output.LayerNorm.bias']
Num Iters: {'TASK3': 10000}
Batch size: {'TASK3': 1}
Traceback (most recent call last):
File "eval_retrieval.py", line 275, in
main()
File "eval_retrieval.py", line 230, in main
score_matrix[caption_idx, image_idx*500:(image_idx+1)*500] = torch.softmax(vil_logit, dim=1)[:,0].view(-1).cpu().numpy()
ValueError: could not broadcast input array from shape (125) into shape (500)
`
I followed exactly as the instructions for zero-shot image retrieval. Any ideas what is missing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@adalisan @ChanningPing and others