Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

for issue#339 #655

Closed
wants to merge 1 commit into from
Closed

for issue#339 #655

wants to merge 1 commit into from

Conversation

imwxc
Copy link

@imwxc imwxc commented Apr 1, 2021

maybe an augmentation cause the target tensor become empty( tensor([ ]) ) , my solution is comment the Affine out so that the bug will be fixed

Closes #339

maybe an augmentation cause the target tensor become empty( tensor([ ]) ) , my solution is comment the Affine out so that the bug will be fixed
@Flova
Copy link
Collaborator

Flova commented Apr 2, 2021

But the Affine is a crucial part of the data augmentation process. I also don't understand how this relates to issues when using negative data or do you use positive data and all boxes get moved out of the image by the augmentation and therefore the image is quasi a negative sample. Negative data is also quite uncommon in this context https://stackoverflow.com/questions/55202727/yolo-object-detection-include-images-that-do-not-contain-classes-to-be-predicte.

@imwxc
Copy link
Author

imwxc commented Apr 2, 2021

But the Affine is a crucial part of the data augmentation process. I also don't understand how this relates to issues when using negative data or do you use positive data and all boxes get moved out of the image by the augmentation and therefore the image is quasi a negative sample. Negative data is also quite uncommon in this context https://stackoverflow.com/questions/55202727/yolo-object-detection-include-images-that-do-not-contain-classes-to-be-predicte.

thanks for your advices,.

I checked my datasets and I found that the images cause the problem have some short-distance boxes so I tryed change the param of translate_percent from (-0.2,0.2) to (-0.05 to 0.05) and the problem also got fixed.

@Flova
Copy link
Collaborator

Flova commented Apr 2, 2021

Ah okay, this seems to speak for the thesis that we convert these ones to negative samples by moving the box out of the image. Thank you for your troubleshooting. Now we need to fix the issue that training fails at negative samples. Could you provide a complete stack trace of the error in the target building? The one in the issue is a bit short and outdated.

@Flova
Copy link
Collaborator

Flova commented Apr 5, 2021

I ran a few trials and I was not able to reproduce this issue with the current master. I indeed get a tensor([], size=(0, 6)) tensor as the target, but it doesn't cause an exception. Maybe you are on an older version of this repo could you send me the commit your on?

@imwxc
Copy link
Author

imwxc commented Apr 5, 2021

I ran a few trials and I was not able to reproduce this issue with the current master. I indeed get a tensor([], size=(0, 6)) tensor as the target, but it doesn't cause an exception. Maybe you are on an older version of this repo could you send me the commit your on?

sorry for late. I tried to get the orignal stack trace but maybe because I update my pytorch so the Traceback become this ( as follow):

Traceback (most recent call last):
File "train.py", line 109, in
loss, outputs = model(imgs, targets)
File "D:\ProgramData\Anaconda3\envs\Pytorch\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "D:\Graduation_Project\YOLOv3-forusing\models.py", line 274, in forward
yolo_outputs = to_cpu(torch.cat(yolo_outputs, 1))
RuntimeError: There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::_cat. This usually means that this function requires a non-empty list of Tensors. Available functions are [CPU, CUDA, QuantizedCPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

@Flova
Copy link
Collaborator

Flova commented Apr 5, 2021

I ran a few trials and I was not able to reproduce this issue with the current master. I indeed get a tensor([], size=(0, 6)) tensor as the target, but it doesn't cause an exception. Maybe you are on an older version of this repo could you send me the commit your on?

sorry for late. I tried to get the orignal stack trace but maybe because I update my pytorch so the Traceback become this ( as follow):

Traceback (most recent call last):
File "train.py", line 109, in
loss, outputs = model(imgs, targets)
File "D:\ProgramData\Anaconda3\envs\Pytorch\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "D:\Graduation_Project\YOLOv3-forusing\models.py", line 274, in forward
yolo_outputs = to_cpu(torch.cat(yolo_outputs, 1))
RuntimeError: There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::_cat. This usually means that this function requires a non-empty list of Tensors. Available functions are [CPU, CUDA, QuantizedCPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

Did you modify any parts of the code or the .cfg or something similar? Because this trace says, that you don't have any Yolo layers in your network. This is obviously a problem.

@imwxc
Copy link
Author

imwxc commented Apr 5, 2021

thanks for your advice. I checked my code and I run my task again. And here is the stack trace for the issue :

targets: tensor([], device='cuda:0', size=(0, 6))

imgs: tensor([[[[0.0000, 0.0000, 0.0000, ..., 0.9412, 0.9882, 0.9882],
[0.0000, 0.0000, 0.0000, ..., 0.5098, 0.9961, 0.9882],
[0.0000, 0.0000, 0.0000, ..., 0.9725, 0.9686, 0.9725],
...,
[0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000]]]],
device='cuda:0')

Traceback (most recent call last):
File "train.py", line 110, in
loss, outputs = model(imgs, targets)
File "D:\ProgramData\Anaconda3\envs\Pytorch\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "D:\Graduation_Project\YOLOv3-forusing\models.py", line 270, in forward
x, layer_loss = module[0](x, targets, img_dim)
File "D:\ProgramData\Anaconda3\envs\Pytorch\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "D:\Graduation_Project\YOLOv3-forusing\models.py", line 196, in forward
ignore_thres=self.ignore_thres,
File "D:\Graduation_Project\YOLOv3-forusing\utils\utils.py", line 303, in build_targets
best_ious, best_n = ious.max(0)
RuntimeError: cannot perform reduction function max on tensor with no elements because the operation does not have an identity
ignore_thres=self.ignore_thres,
File "D:\Graduation_Project\YOLOv3-forusing\utils\utils.py", line 303, in build_targets
best_ious, best_n = ious.max(0)
RuntimeError: cannot perform reduction function max on tensor with no elements because the operation does not have an identity

and I also checked my target txt file of the error target here is my txt file data:

3 0.07633587786259542 0.5251908396946565 0.07633587786259542 0.04122137404580153
0 0.1099236641221374 0.35 0.1099236641221374 0.06030534351145038
5 0.08015267175572519 0.45610687022900764 0.08015267175572519 0.10763358778625955
2 0.04351145038167939 0.47748091603053433 0.04351145038167939 0.08015267175572519
1 0.0648854961832061 0.6423664122137405 0.0648854961832061 0.0450381679389313

** I also checked my imgage and here is my image data**

241

@Flova
Copy link
Collaborator

Flova commented Apr 5, 2021

Thank you for your detailed information! It seems like your code is not up to date. Could you provide the commit hash of your HEAD? It seems like your code uses ignore_thres=self.ignore_thres which is not in the current codebase.

@Flova
Copy link
Collaborator

Flova commented Apr 5, 2021

This PR on the other hand seems up to date.

@imwxc
Copy link
Author

imwxc commented Apr 5, 2021

Thank you for your detailed information! It seems like your code is not up to date. Could you provide the commit hash of your HEAD? It seems like your code uses ignore_thres=self.ignore_thres which is not in the current codebase.

thanks ! the commit hash is 24381e5 which is 11 days ago . the code seems out of date. and i'll update my code. thanks again !!!

@Flova
Copy link
Collaborator

Flova commented Apr 5, 2021

Commit 24381e5 should be fine imo. But it doesn't line up with the stack trace. Thats weird.

@Flova
Copy link
Collaborator

Flova commented Apr 5, 2021

The stack trace shows a state previous to #646.

@Flova Flova closed this Apr 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants