Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zero mAP and no detections on custom dataset #61

Closed
ghost opened this issue Feb 23, 2021 · 14 comments
Closed

Zero mAP and no detections on custom dataset #61

ghost opened this issue Feb 23, 2021 · 14 comments

Comments

@ghost
Copy link

ghost commented Feb 23, 2021

Dear Haotian Liu,
currently I'm trying the yolact_edge trained on custom coco-like dataset for one class ('person'). Contrary to coco the image resolution is 512.

When I run evaluation with TensorRT conversion I get the following error during the protonet conversion:

[02/23 15:26:23 yolact.eval]: Converting protonet to TensorRT...
Traceback (most recent call last):
  File "eval.py", line 1241, in <module>
    convert_to_tensorrt(net, cfg, args, transform=BaseTransform())
  File "/home/oidpsv/yolact_edge/utils/tensorrt.py", line 156, in convert_to_tensorrt
    net.to_tensorrt_protonet(cfg.torch2trt_protonet_int8, calibration_dataset=calibration_protonet_dataset, batch_size=args.trt_batch_size)
  File "/home/oidpsv/yolact_edge/yolact.py", line 1565, in to_tensorrt_protonet
    self.trt_load_if("proto_net", trt_fn, [x], int8_mode, batch_size=batch_size)
  File "/home/oidpsv/yolact_edge/yolact.py", line 1534, in trt_load_if
    module = trt_fn(module, trt_fn_params)
  File "/opt/conda/lib/python3.6/site-packages/torch2trt-0.1.0-py3.6-linux-x86_64.egg/torch2trt/torch2trt.py", line 555, in torch2trto
    engine = builder.build_cuda_engine(network)
  File "/opt/conda/lib/python3.6/site-packages/torch2trt-0.1.0-py3.6-linux-x86_64.egg/torch2trt/calibration.py", line 51, in get_batch
    buffer[i].copy_(tensor)
RuntimeError: The size of tensor a (69) must match the size of tensor b (64) at non-singleton dimension 2

which can be fixed by changing line 1563 in yolact.py from x = torch.ones((1, 256, 69, 69)).cuda()
to x = torch.ones((1, 256, 64, 64)).cuda().

The problem is that in this case further evaluation with TensorRT conversion gives zero mAP and processing of images provides empty result (no masks or boxes). Could you please help me? Many thanks.

@haotian-liu
Copy link
Collaborator

What is the mAP if without TensorRT? Also did you set the classes to ('person') or ('person',), the first one is incorrect.

@ghost
Copy link
Author

ghost commented Feb 24, 2021

It is in the range of 70-80 without TensorRT. Also, the classes are set as ('person', ) label map is = { 1: 1,}

@haotian-liu
Copy link
Collaborator

If you try evaluate with FP16 TensorRT, will the AP improve?

@ghost
Copy link
Author

ghost commented Feb 24, 2021

Unfortunately it is same with FP16 - zero mAP

@haotian-liu
Copy link
Collaborator

Please try the latest code base in haotian-dev branch.

@XYunaaa
Copy link

XYunaaa commented Mar 1, 2021

Hi @oidpsv ,I also encountered this problem. Have you found a solution now?

@haotian-liu
Copy link
Collaborator

@XYunaaa Please see the newest code base and use the safe mode as explained here.

@Jason-Lee0
Copy link

Jason-Lee0 commented Mar 16, 2021

@haotian-liu sorry to bother you.
I have encountered the similar issue.
But in my situation,I replaced the backbone with rexnetv1 (https://github.com/clovaai/rexnet/blob/master/rexnetv1.py).

It can evaluate normally when I disable tensorrt.
But when I want to convert the model with tensorrt command, it showed the error.

First of all:
I add the following code in trt modules.

elif cfg.backbone.name == "MobileNetV2":
            x = [
                torch.randn(1, 32, 69, 69).cuda(),
                torch.randn(1, 64, 35, 35).cuda(),
                torch.randn(1, 160, 18, 18).cuda(),
                ]
        elif cfg.backbone.name == "MobileNetV3":
            x = [
                torch.randn(1, 24, 69, 69).cuda(),
                torch.randn(1, 40, 35, 35).cuda(),
                torch.randn(1, 40, 18, 18).cuda(),
                ]                
        elif cfg.backbone.name == "Rexnet":
            x = [
                torch.randn(1, 16, 69,69).cuda(),
                torch.randn(1, 27, 35, 35).cuda(),
                torch.randn(1, 50, 18, 18).cuda(),
                ]           
    if cfg.backbone.name == "ResNet50" or cfg.backbone.name == "ResNet101":
            x = [
                torch.randn(1, 256, 69, 69).cuda(),
                torch.randn(1, 256, 35, 35).cuda(),
                torch.randn(1, 256, 18, 18).cuda(),
                ]
        elif cfg.backbone.name == "MobileNetV2":
            x = [
                torch.randn(1, 256, 69, 69).cuda(),
                torch.randn(1, 256, 35, 35).cuda(),
                torch.randn(1, 256, 18, 18).cuda(),
                ]
        elif cfg.backbone.name == "MobileNetV3":
            x = [
                torch.randn(1, 256, 69, 69).cuda(),
                torch.randn(1, 256, 35, 35).cuda(),
                torch.randn(1, 256, 18, 18).cuda(),
                ]
        elif cfg.backbone.name == "Rexnet":
            x = [
                torch.randn(1, 256, 69,69).cuda(),
                torch.randn(1, 256, 35, 35).cuda(),
                torch.randn(1, 256, 18, 18).cuda(),
                ]         

        if cfg.backbone.name == "ResNet50" or cfg.backbone.name == "ResNet101":
            x = torch.randn(1, 512, 69, 69).cuda()
        elif cfg.backbone.name == "MobileNetV2":
            x = torch.randn(1, 32, 69, 69).cuda()
        elif cfg.backbone.name == "MobileNetV3":
            x = torch.randn(1, 24, 69, 69).cuda() 
        elif cfg.backbone.name == "Rexnet":
            x = torch.randn(1, 16,69,69).cuda()      

The setting format about Mobilenetv3 and Rexnet refer the code.

When I use model with rexnet, the error showed.

image

And I changed the line 1512:

x=torch.ones((1,256,69,69)).cuda() -> x=torch.ones((1,256,112,112)).cuda()

The error will not showed , but it still got zero mAP.
So I think the line is not a problem.

I think maybe the problem is caused by my setting about torch.randn

Could you tell me how to set the torch.randn in the correct way ?
How to define the parameter in torch module ?

Thanks for your reading~

Have a nice day.

BTW, I have already used the command: --use_tensorrt_safe_mode
It still get the error.

@haotian-liu
Copy link
Collaborator

@ntut108318099 You should put in torch.randn(n,c,h,w) where it is the expected shape of the tensor for all the images when passing through the module.

@Jason-Lee0
Copy link

@haotian-liu Thanks for your reply.

Sorry,
Could you tell me where can I expect shape of the tensor for all the images when passing through the module ?
Which code can I used to saw the result ?

Thanks for your help.

@haotian-liu
Copy link
Collaborator

@ntut108318099

In each module, e.g. FPN_phase 1, it will have some inputs def forward(self, x1, x2, x3), we will put x = [ torch.randn(*x1.shape), torch.randn(*x2.shape), torch.randn(*x3.shape) ] there. Replace them with actual numbers.

@Jason-Lee0
Copy link

@haotian-liu

Depending on your advice, I convert the trt module successfully.
But it showed the another error , it's about shape '[1 ,25575, 81]' is invalid for input of size 1559088.

image

Have you ever met this error before ?

@Jason-Lee0
Copy link

@haotian-liu Hi,Sorry to bother you.

I have fixed the problem .

The problem was caused by "PredictionModuleTRTWrapper" . I didn't notice that have some parameter also need to set .

The result looks great !!
Thanks for your supply.

Have a nice day .

@haotian-liu
Copy link
Collaborator

@ntut108318099 Glad you have the problem solved!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants