2 questions about calibrations #38

sdimantsd · 2021-01-24T13:45:02Z

Do I need a annotations file for calibrations?
I am using TensorRT fp16 optimize (--use_fp16_tensorrt) but the network didn't find even one object (without the TensorRT it's works perfect).
It is possible that the fp16 don't do the calibration?
It look like it from the code (https://github.com/haotian-liu/yolact_edge/blob/662d760f8b2d8b4409d385aaf172e155aaa3a3d8/utils/tensorrt.py#L38)

Thanks

haotian-liu · 2021-01-24T23:56:15Z

no you don't, just put the calibration images to a directory, and specify it with --calib_images option.
FP16 does not have the option to calibrate. This is weird, as from our experiment (in the paper), converting to FP16 affects the AP very little. So AP wise, what is the difference between AP_FP32 and AP_FP16? And if you evaluate with our pretrained models, does FP16 conversion give you good results?

sdimantsd · 2021-01-25T07:38:13Z

Thanks for your response :-)

In COCO dataset with your weights it's look the same (with TensorRT fp16 and without TensorRT) but in my dataset the results are much worse...
Here a 2 images for example:

Command: python3 eval.py --yolact_transfer --disable_tensorrt --trained_model=./weights/yolact_resnet101_im400_12_340000.pth --images=/home/ws/images/imgs_in/:/home/ws/imgs_out --top_k=10 --score_threshold=0.3

Command: python3 eval.py --yolact_transfer --use_fp16_tensorrt --trained_model=./weights/yolact_resnet101_im400_12_340000.pth --images=/home/ws/images/imgs_in/:/home/ws/imgs_out --top_k=10 --score_threshold=0.3

As you can see, in TensorRT fp16 the network didn't recognize even one car!

Can I train the network with fp16 optimize?
How can I do it?

Thanks! :-)

haotian-liu · 2021-01-25T09:01:38Z

Hi, I tried the the model on your image with both FP32/FP16/INT8, and the results are reasonable. I uploaded them here.

The command and the model I am using (FP16 as example):
python eval.py --trained_model=./weights/yolact_edge_54_800000.pth --score_threshold=0.3 --top_k=100 --use_fp16_tensorrt --image=cars.png:seg_out_fp16.jpg

Can you try with the model I am using and see if it is due to the model file or TensorRT? And please pull the newest commit from our repo because the newest code will automatically convert YOLACT weights.

sdimantsd · 2021-01-25T09:05:06Z

I will try it now.
What about training with TensorRT fp16/int8?

haotian-liu · 2021-01-25T09:08:24Z

We do not need to train with FP16. Simply train a full precision model and you can convert the trained model to FP16/INT8 using TensorRT.

sdimantsd · 2021-01-25T09:16:49Z

This is the result with TensorRT fp16. the results without TensorRT look the same.

If it's matters I am using Jetson Nano (and this is the reason I need fp16 and not int8. because Jetson Nano did not support int8).

haotian-liu · 2021-01-25T09:20:54Z

Can you try removing all TensorRT cache with rm /path/to/weights/*.trt, and re run the model with your previous command? It might be you have once run the evaluation without yolact_transfer and the converted TensorRT model (with incorrect weights) was stored in cache.

sdimantsd · 2021-01-25T09:26:43Z

This is what I did:
`ws@PC_2:/DL/yolact_edge$ rm weights/*.trt
ws@PC_2:/DL/yolact_edge$ ls weights/
yolact_edge_54_800000.pth yolact_nets_shuk_resnet101_im400_12_340000.pth
yolact_edge_vid_847_50000.pth yolact_resnet101_im350_low_height_58_460000.pth
ws@PC_2:~/DL/yolact_edge$ python3 eval.py --use_fp16_tensorrt --trained_model=./weights/yolact_nets_shuk_resnet101_im400_12_340000.pth --images=/home/ws/images/imgs_in/:/home/ws/imgs_out --top_k=10 --score_threshold=0.3
In /home/ws/.local/lib/python3.6/site-packages/matplotlib/mpl-data/stylelib/_classic_test.mplstyle:
The text.latex.preview rcparam was deprecated in Matplotlib 3.3 and will be removed two minor releases later.
In /home/ws/.local/lib/python3.6/site-packages/matplotlib/mpl-data/stylelib/_classic_test.mplstyle:
The mathtext.fallback_to_cm rcparam was deprecated in Matplotlib 3.3 and will be removed two minor releases later.
In /home/ws/.local/lib/python3.6/site-packages/matplotlib/mpl-data/stylelib/_classic_test.mplstyle: Support for setting the 'mathtext.fallback_to_cm' rcParam is deprecated since 3.3 and will be removed two minor releases later; use 'mathtext.fallback : 'cm' instead.
In /home/ws/.local/lib/python3.6/site-packages/matplotlib/mpl-data/stylelib/_classic_test.mplstyle:
The validate_bool_maybe_none function was deprecated in Matplotlib 3.3 and will be removed two minor releases later.
In /home/ws/.local/lib/python3.6/site-packages/matplotlib/mpl-data/stylelib/_classic_test.mplstyle:
The savefig.jpeg_quality rcparam was deprecated in Matplotlib 3.3 and will be removed two minor releases later.
In /home/ws/.local/lib/python3.6/site-packages/matplotlib/mpl-data/stylelib/_classic_test.mplstyle:
The keymap.all_axes rcparam was deprecated in Matplotlib 3.3 and will be removed two minor releases later.
In /home/ws/.local/lib/python3.6/site-packages/matplotlib/mpl-data/stylelib/_classic_test.mplstyle:
The animation.avconv_path rcparam was deprecated in Matplotlib 3.3 and will be removed two minor releases later.
In /home/ws/.local/lib/python3.6/site-packages/matplotlib/mpl-data/stylelib/_classic_test.mplstyle:
The animation.avconv_args rcparam was deprecated in Matplotlib 3.3 and will be removed two minor releases later.
Config not specified. Parsed yolact_nets_shuk_resnet101_im400_config from the file name.

[01/25 11:22:15 yolact.eval]: Loading model...
[01/25 11:22:34 yolact.eval]: Model loaded.
[01/25 11:22:34 yolact.eval]: Converting to TensorRT...
[01/25 11:22:34 yolact.eval]: Converting backbone to TensorRT...
[01/25 11:23:54 yolact.eval]: Converting protonet to TensorRT...
[01/25 11:24:15 yolact.eval]: Converting FPN to TensorRT...
Warning: Encountered known unsupported method torch.zeros
[01/25 11:24:38 yolact.eval]: Converting PredictionModule to TensorRT...
[01/25 11:24:59 yolact.eval]: Converted to TensorRT.

/home/ws/images/imgs_in/cars.jpg -> /home/ws/imgs_out/cars.png`

but the results are the same...

sdimantsd · 2021-01-25T09:27:26Z

The line over the first lines it because the '~' in linux

sdimantsd · 2021-01-25T10:12:41Z

bdw, this is the config:

yolact_nets_shuk_resnet101_im400_config = yolact_edge_config.copy({
    'name': 'yolact_nets_shuk_resnet101_im400',

    # Dataset stuff
    'dataset': nets_shuk_dataset,
    'num_classes': len(nets_shuk_dataset.class_names) + 1,

    'masks_to_train': 100,
    'max_num_detections': 50,
    'max_size': 400,

    'backbone': yolact_base_config.backbone.copy({
        'pred_scales': [[int(x[0] / yolact_base_config.max_size * 400)] for x in
                        yolact_base_config.backbone.pred_scales],
    }),
})

NETS_CAR_TRUCK_BUS = ('car', 'bus', 'truck')
NETS_CAR_TRUCK_BUS _LABEL_MAP = {1:  1,  2:  2, 3: 3}

haotian-liu · 2021-01-25T18:56:09Z

So you basically trained your own model on this dataset with an image size of 400x400? Would you mind sharing the trained model with email to me so that I can test it. We haven't seen a model that has such huge performance difference between FP16 and FP32.

sdimantsd · 2021-01-25T19:00:05Z

Yes, I can share it with you.
What is your email?

Thanks

haotian-liu · 2021-01-25T19:00:50Z

liuhaotian.cn at gmail

sdimantsd · 2021-01-25T19:07:03Z

OK, I am not at work right now (it's 21:00 in our time).
I will send it tomorrow.

sdimantsd · 2021-01-25T20:10:30Z

@haotian-liu I sent it now. Thanks

haotian-liu · 2021-01-26T00:55:47Z

My collaborator and I will take a look later this week, and will let you know with the updates, thanks.

sdimantsd · 2021-01-26T09:57:28Z

Thanks!

sdimantsd · 2021-02-03T09:58:30Z

Hi @haotian-liu
Anything new about it?

haotian-liu · 2021-02-04T00:37:01Z

@sdimantsd Hi we found that it is due to TensorRT conversion of prediction module/FPN. As when we disable these two conversion, and only use the backbone/protonet conversion, everything works fine. Could you try this on your model / dataset? We also found that native PyTorch FP16 conversion works fine. We decide to contact the upstream TensorRT and torch2trt maintainers for more information and help.

sdimantsd · 2021-02-04T08:52:26Z

Thanks!
How can I disable conversion for the FPN?

haotian-liu · 2021-02-06T00:36:31Z

Setting these two option to False in the config allows you to disable TensorRT for FPN (similar for other modules)

{
    'torch2trt_fpn': False,
    'torch2trt_fpn_int8': False,
}

haotian-liu · 2021-02-06T00:44:56Z

Hi I am currently closing this issue, and merge the discussion related to TensorRT conversion issue after training on a custom dataset to this issue #47 as it is quite hard for me to track so many open issues. Hope you understand, thanks.

haotian-liu · 2021-02-07T02:44:01Z

I somehow figured out that the cause and applied the fix, details of the solution are explained in #47. Please take a look to see if the issue can be resolved.
If the issue persists, please reply directly to #47 (this will be the main thread to deal with related issues for now) with experiment configurations (details also explain there). Thanks.

sdimantsd · 2021-02-08T07:01:00Z

OK.
Thanks :-)

chingi071 · 2021-02-20T09:43:16Z

@sdimantsd Hello, I would like to ask, do you make inferences on jeston nano? The backbone is resnet101 and the image size is 400, right? How much memory did you use during conversion to tensorrt and inference? Because I couldn’t make inferences on jetson nano 2gb, it was killed. My backbone is mobilenetv2 and image size I tried 320, 160, 80. I'm thinking about whether to change to jetson nano 4gb. In addition, will the pred_scales affect the result? I see that you have made changes. Thank you.
I also very grateful for @haotian-liu open source, this is very good work.

haotian-liu · 2021-02-20T09:59:38Z

@chingi071 You can try to set the cfg.torch2trt_max_calibration_images to lower (e.g. 5), if it still OOM, then you might set it to use TensorRT FP16 with --use_fp16_tensorrt.

chingi071 · 2021-02-20T10:23:51Z

@haotian-liu Hello, I tried setting cfg.torch2trt_max_calibration_images to a smaller value (1, 5), and used --use_fp16_tensorrt, but it was still killed... My current resolution is 320, do I need to set a smaller one?

haotian-liu · 2021-02-20T10:26:05Z

What if you only use the PyTorch version? We haven't been testing our method on Jetson Nano, thus I cannot provide much of the advice.

chingi071 · 2021-02-20T10:33:53Z

I use the pytorch version. Then I try to use Jetson Nano 4GB to see if it can be inference,thank you.

haotian-liu · 2021-02-20T10:36:49Z

@chingi071 Not sure if I am not being clear, I mean use --disable_tensorrt for pure PyTorch inference.

chingi071 · 2021-02-20T10:55:07Z

Oh, I misunderstand. I haven't tried use --disable_tensorrt. I will try it,thank you very much!

sdimantsd · 2021-02-21T08:57:15Z

Hi @chingi071
I am using Jetson Nano 4 GB, not 2 GB.
With 4 GB it's works with input size of 500x500 and fp16 with ~540 ms for frame (around 1.85 FPS).
Hope that helps you.

chingi071 · 2021-02-23T09:05:08Z

@sdimantsd Thank you very much! This is very useful information for me.

This was referenced Jan 27, 2021

No output | Custom dataset | TensorRT #42

Closed

RuntimeError: CUDA error: an illegal memory access was encountered #44

Closed

haotian-liu closed this as completed Feb 6, 2021

DableUTeeF mentioned this issue Jun 7, 2021

TensorRT FP16 performs significantly worse than FP32, while INT8 gives RuntimeError: The size of tensor a (69) must match the size of tensor b (100) at non-singleton dimension 2 #123

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2 questions about calibrations #38

2 questions about calibrations #38

sdimantsd commented Jan 24, 2021 •

edited

haotian-liu commented Jan 24, 2021

sdimantsd commented Jan 25, 2021

haotian-liu commented Jan 25, 2021

sdimantsd commented Jan 25, 2021

haotian-liu commented Jan 25, 2021

sdimantsd commented Jan 25, 2021

haotian-liu commented Jan 25, 2021

sdimantsd commented Jan 25, 2021

sdimantsd commented Jan 25, 2021

sdimantsd commented Jan 25, 2021

haotian-liu commented Jan 25, 2021

sdimantsd commented Jan 25, 2021

haotian-liu commented Jan 25, 2021

sdimantsd commented Jan 25, 2021

sdimantsd commented Jan 25, 2021

haotian-liu commented Jan 26, 2021

sdimantsd commented Jan 26, 2021

sdimantsd commented Feb 3, 2021

haotian-liu commented Feb 4, 2021

sdimantsd commented Feb 4, 2021

haotian-liu commented Feb 6, 2021

haotian-liu commented Feb 6, 2021

haotian-liu commented Feb 7, 2021 •

edited

sdimantsd commented Feb 8, 2021

chingi071 commented Feb 20, 2021

haotian-liu commented Feb 20, 2021

chingi071 commented Feb 20, 2021

haotian-liu commented Feb 20, 2021

chingi071 commented Feb 20, 2021

haotian-liu commented Feb 20, 2021

chingi071 commented Feb 20, 2021

sdimantsd commented Feb 21, 2021

chingi071 commented Feb 23, 2021

2 questions about calibrations #38

2 questions about calibrations #38

Comments

sdimantsd commented Jan 24, 2021 • edited

haotian-liu commented Jan 24, 2021

sdimantsd commented Jan 25, 2021

haotian-liu commented Jan 25, 2021

sdimantsd commented Jan 25, 2021

haotian-liu commented Jan 25, 2021

sdimantsd commented Jan 25, 2021

haotian-liu commented Jan 25, 2021

sdimantsd commented Jan 25, 2021

sdimantsd commented Jan 25, 2021

sdimantsd commented Jan 25, 2021

haotian-liu commented Jan 25, 2021

sdimantsd commented Jan 25, 2021

haotian-liu commented Jan 25, 2021

sdimantsd commented Jan 25, 2021

sdimantsd commented Jan 25, 2021

haotian-liu commented Jan 26, 2021

sdimantsd commented Jan 26, 2021

sdimantsd commented Feb 3, 2021

haotian-liu commented Feb 4, 2021

sdimantsd commented Feb 4, 2021

haotian-liu commented Feb 6, 2021

haotian-liu commented Feb 6, 2021

haotian-liu commented Feb 7, 2021 • edited

sdimantsd commented Feb 8, 2021

chingi071 commented Feb 20, 2021

haotian-liu commented Feb 20, 2021

chingi071 commented Feb 20, 2021

haotian-liu commented Feb 20, 2021

chingi071 commented Feb 20, 2021

haotian-liu commented Feb 20, 2021

chingi071 commented Feb 20, 2021

sdimantsd commented Feb 21, 2021

chingi071 commented Feb 23, 2021

sdimantsd commented Jan 24, 2021 •

edited

haotian-liu commented Feb 7, 2021 •

edited