Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: invalid argument 2: Input tensor must have same size as output tensor apart from the specified dimension at /opt/conda/conda #2

Open
KevinQian97 opened this issue Sep 6, 2018 · 5 comments

Comments

@KevinQian97
Copy link

Hello,I used your code to train. However, the model terminate after first iter
Would you please help me find out the problem?
Thank you
Here are my Trace backs:
[session 1][epoch 1][iter 0] loss: 4.0006, lr: 1.00e-02
fg/bg=(128/384), time cost: 7.218862
rpn_cls: 0.6919, rpn_box: 0.1386, rcnn_cls: 2.8319, rcnn_box 0.3382
Traceback (most recent call last):
File "trainval_net.py", line 330, in
roi_labels = FPN(im_data, im_info, gt_boxes, num_boxes)
File "/home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 73, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 83, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/zhiqi.cheng/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/parallel_apply.py", line 67, in parallel_apply
raise output
RuntimeError: invalid argument 2: Input tensor must have same size as output tensor apart from the specified dimension at /opt/conda/conda-bld/pytorch_1518238409320/work/torch/lib/THC/generic/THCTensorScatterGather.cu:29

@KevinQian97
Copy link
Author

I found that the code runs normally on faster-rcnn. But if I use the code of fpn, it failed. So I guess the problem happens in fpn.py, but I still can't find out why.
What's more, I used this model to train my personal data, if I changed the data back to origin Voc2007, it works. That's strange. I just changed my personal data into the form of Voc2007.
Here is one of my annotation file:

train
VIRAT_S_000000.mp4_0
C:/Users/Kevin Qian/Downloads/images/train/VIRAT_S_000000.mp4_0.jpg

Unknown 1920 1080 3 0 Other 0 636 723 655 787 Other 0 411 618 438 703 Person 0 349 709 410 850 Other 0 760 758 778 831 Person 0 1386 245 1432 354 Person 0 276 688 345 845 Other 0 512 687 541 747

and here is the annotation file in original voc2007

VOC2007
009962.jpg

The VOC2007 Database
PASCAL VOC2007
flickr
246788553


Tool - Wroclaw
Milosz J.


500
375
3

0

chair
Right
1
0

211
192
324
326



person
Unspecified
1
0

162
72
273
248



person
Right
1
0

250
68
473
312



person
Right
1
0

4
2
253
374



diningtable
Unspecified
1
1

358
216
500
375


@KevinQian97
Copy link
Author

I have solved the problem through downloading the whole pascal data set and change the data part instead of directly using my personal data.
But it's interesting that I think your code is based on that of jwyang. But through using the method of changing data part, I can successfully use your code to train but that still doesn't work when it comes to jwyang's work. So, would you mind telling me if you changed some codes which is relevant to reading data from data set?

@jacco09
Copy link

jacco09 commented Oct 20, 2018

I met the same problem.Could you share your solution in detail.Thanks!

@JingXiaolun
Copy link

@KevinQian97 ,I met the same problem.Could you share your solution in detail.Thanks!

@hailey94
Copy link

@KevinQian97 , I met the same problem.Could you share your solution in detail.Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants