Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Training is in progress] [Feature] Support RT-DETR #10498

Open
wants to merge 21 commits into
base: dev-3.x
Choose a base branch
from

Conversation

nijkah
Copy link
Contributor

@nijkah nijkah commented Jun 13, 2023

Checklist

  • Reproduce with pre-trained weight
  • Reproduce training
  • Unit Test
  • Complement Docstring (typehint)

Motivation

Support RT-DETR https://arxiv.org/abs/2304.08069
resolves #10186

Consideration

  1. In RT-DETR, the transformer encoder is applied as a neck.
  2. In RT-DETR, bbox_heads are only used in the transformer decoder.

Modification

COCO_val evaluation.

06/13 11:28:51 - mmengine - INFO - Evaluating bbox...
Loading and preparing results...
DONE (t=7.24s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=140.32s).
Accumulating evaluation results...
DONE (t=37.14s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.531
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.713
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.577
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.348
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.580
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.700
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.723
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.725
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.725
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.550
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.767
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.882
06/13 11:32:08 - mmengine - INFO - bbox_mAP_copypaste: 0.531 0.713 0.577 0.348 0.580 0.700
06/13 11:32:10 - mmengine - INFO - Epoch(test) [5000/5000]    coco/bbox_mAP: 0.5310  coco/bbox_mAP_50: 0.7130  coco/bbox_mAP_75: 0.5770  coco/bbox_mAP_s: 0.3480  coco/bbox_mAP_m: 0.5800  coco/bbox_mAP_l: 0.7000  data_time: 0.0048  time: 0.2935

@nijkah nijkah changed the base branch from main to dev-3.x June 13, 2023 11:59
@nijkah
Copy link
Contributor Author

nijkah commented Jun 19, 2023

Current Status

Training Performance is not reproduced yet

With the current config, I got below result. The performance fluctuates after 18 epochs.

...
2023/06/14 18:24:42 - mmengine - INFO - Epoch(val) [6][2500/2500]    coco/bbox_mAP: 0.4540  coco/bbox_mAP_50: 0.6260  coco/bbox_mAP_75: 0.4910  coco/bbox_mAP_s: 0.2580  coco/bbox_mAP_m: 0.5020  coco/bbox_mAP_l: 0.6400  data_time: 0.0018  time: 0.0309
2023/06/14 23:26:23 - mmengine - INFO - Epoch(val) [12][2500/2500]    coco/bbox_mAP: 0.4850  coco/bbox_mAP_50: 0.6610  coco/bbox_mAP_75: 0.5220  coco/bbox_mAP_s: 0.2860  coco/bbox_mAP_m: 0.5310  coco/bbox_mAP_l: 0.6720  data_time: 0.0016  time: 0.0312
2023/06/15 04:30:59 - mmengine - INFO - Epoch(val) [18][2500/2500]    coco/bbox_mAP: 0.4960  coco/bbox_mAP_50: 0.6750  coco/bbox_mAP_75: 0.5370  coco/bbox_mAP_s: 0.2950  coco/bbox_mAP_m: 0.5430  coco/bbox_mAP_l: 0.6850  data_time: 0.0018  time: 0.0316
...
2023/06/17 05:27:59 - mmengine - INFO - Epoch(val) [72][2500/2500]    coco/bbox_mAP: 0.4960  coco/bbox_mAP_50: 0.6780  coco/bbox_mAP_75: 0.5320  coco/bbox_mAP_s: 0.3070  coco/bbox_mAP_m: 0.5470  coco/bbox_mAP_l: 0.6680  data_time: 0.0015  time: 0.0305

I thought the reason is due to the difference in transforms in the training dataset, so edited as below;
Main difference lies on some hyperparameters and the order of RandomCrop.

train_pipeline = [
    dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}),
    dict(type='LoadAnnotations', with_bbox=True),
        dict(type='PhotoMetricDistortion'),
    dict(
        type='Expand',
        mean=[123.675, 116.28, 103.53],
        to_rgb=True,
        ratio_range=(1, 4)),
    dict(type='RandomCrop', crop_size=(0.3, 1.0), crop_type='relative_range'),
    dict(type='RandomFlip', prob=0.5),
    dict(type='RandomChoiceResize',
         scales=[(480, 480),  (512, 512), (544, 544), (576, 576),
                 (608, 608), (640, 640), (640, 640), (640, 640),
                 (672, 672), (704, 704), (736, 736), (768, 768),
                 (800, 800)],
                 keep_ratio=False),
    dict(type='PackDetInputs')
]

But, it shows slower convergence and poor result.

2023/06/16 13:03:15 - mmengine - INFO - Epoch(val) [6][2500/2500]    coco/bbox_mAP: 0.3670  coco/bbox_mAP_50: 0.5340  coco/bbox_mAP_75: 0.3950  coco/bbox_mAP_s: 0.1730  coco/bbox_mAP_m: 0.4070  coco/bbox_mAP_l: 0.5530  data_time: 0.0019  time: 0.0317
2023/06/16 19:28:04 - mmengine - INFO - Epoch(val) [12][2500/2500]    coco/bbox_mAP: 0.4230  coco/bbox_mAP_50: 0.5980  coco/bbox_mAP_75: 0.4560  coco/bbox_mAP_s: 0.2140  coco/bbox_mAP_m: 0.4680  coco/bbox_mAP_l: 0.6230  data_time: 0.0017  time: 0.0315
2023/06/17 01:52:12 - mmengine - INFO - Epoch(val) [18][2500/2500]    coco/bbox_mAP: 0.4440  coco/bbox_mAP_50: 0.6220  coco/bbox_mAP_75: 0.4780  coco/bbox_mAP_s: 0.2440  coco/bbox_mAP_m: 0.4890  coco/bbox_mAP_l: 0.6390  data_time: 0.0016  time: 0.0312
2023/06/17 08:18:02 - mmengine - INFO - Epoch(val) [24][2500/2500]    coco/bbox_mAP: 0.4580  coco/bbox_mAP_50: 0.6380  coco/bbox_mAP_75: 0.4950  coco/bbox_mAP_s: 0.2500  coco/bbox_mAP_m: 0.5050  coco/bbox_mAP_l: 0.6520  data_time: 0.0021  time: 0.0323
...
06/19 12:47:44 - mmengine - INFO - Epoch(val) [72][2500/2500]    coco/bbox_mAP: 0.4900  coco/bbox_mAP_50: 0.6750  coco/bbox_mAP_75: 0.5270  coco/bbox_mAP_s: 0.3030  coco/bbox_mAP_m: 0.5380  coco/bbox_mAP_l: 0.6780  data_time: 0.0017  time: 0.0317

I'm still trying to figure out this.

@hhaAndroid
Copy link
Collaborator

image

Current Status

Training Performance is not reproduced yet

With the current config, I got below result. The performance fluctuates after 18 epochs.

...
2023/06/14 18:24:42 - mmengine - INFO - Epoch(val) [6][2500/2500]    coco/bbox_mAP: 0.4540  coco/bbox_mAP_50: 0.6260  coco/bbox_mAP_75: 0.4910  coco/bbox_mAP_s: 0.2580  coco/bbox_mAP_m: 0.5020  coco/bbox_mAP_l: 0.6400  data_time: 0.0018  time: 0.0309
2023/06/14 23:26:23 - mmengine - INFO - Epoch(val) [12][2500/2500]    coco/bbox_mAP: 0.4850  coco/bbox_mAP_50: 0.6610  coco/bbox_mAP_75: 0.5220  coco/bbox_mAP_s: 0.2860  coco/bbox_mAP_m: 0.5310  coco/bbox_mAP_l: 0.6720  data_time: 0.0016  time: 0.0312
2023/06/15 04:30:59 - mmengine - INFO - Epoch(val) [18][2500/2500]    coco/bbox_mAP: 0.4960  coco/bbox_mAP_50: 0.6750  coco/bbox_mAP_75: 0.5370  coco/bbox_mAP_s: 0.2950  coco/bbox_mAP_m: 0.5430  coco/bbox_mAP_l: 0.6850  data_time: 0.0018  time: 0.0316
...
2023/06/17 05:27:59 - mmengine - INFO - Epoch(val) [72][2500/2500]    coco/bbox_mAP: 0.4960  coco/bbox_mAP_50: 0.6780  coco/bbox_mAP_75: 0.5320  coco/bbox_mAP_s: 0.3070  coco/bbox_mAP_m: 0.5470  coco/bbox_mAP_l: 0.6680  data_time: 0.0015  time: 0.0305

I thought the reason is due to the difference in transforms in the training dataset, so edited as below; Main difference lies on some hyperparameters and the order of RandomCrop.

train_pipeline = [
    dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}),
    dict(type='LoadAnnotations', with_bbox=True),
        dict(type='PhotoMetricDistortion'),
    dict(
        type='Expand',
        mean=[123.675, 116.28, 103.53],
        to_rgb=True,
        ratio_range=(1, 4)),
    dict(type='RandomCrop', crop_size=(0.3, 1.0), crop_type='relative_range'),
    dict(type='RandomFlip', prob=0.5),
    dict(type='RandomChoiceResize',
         scales=[(480, 480),  (512, 512), (544, 544), (576, 576),
                 (608, 608), (640, 640), (640, 640), (640, 640),
                 (672, 672), (704, 704), (736, 736), (768, 768),
                 (800, 800)],
                 keep_ratio=False),
    dict(type='PackDetInputs')
]

But, it shows slower convergence and poor result.

2023/06/16 13:03:15 - mmengine - INFO - Epoch(val) [6][2500/2500]    coco/bbox_mAP: 0.3670  coco/bbox_mAP_50: 0.5340  coco/bbox_mAP_75: 0.3950  coco/bbox_mAP_s: 0.1730  coco/bbox_mAP_m: 0.4070  coco/bbox_mAP_l: 0.5530  data_time: 0.0019  time: 0.0317
2023/06/16 19:28:04 - mmengine - INFO - Epoch(val) [12][2500/2500]    coco/bbox_mAP: 0.4230  coco/bbox_mAP_50: 0.5980  coco/bbox_mAP_75: 0.4560  coco/bbox_mAP_s: 0.2140  coco/bbox_mAP_m: 0.4680  coco/bbox_mAP_l: 0.6230  data_time: 0.0017  time: 0.0315
2023/06/17 01:52:12 - mmengine - INFO - Epoch(val) [18][2500/2500]    coco/bbox_mAP: 0.4440  coco/bbox_mAP_50: 0.6220  coco/bbox_mAP_75: 0.4780  coco/bbox_mAP_s: 0.2440  coco/bbox_mAP_m: 0.4890  coco/bbox_mAP_l: 0.6390  data_time: 0.0016  time: 0.0312
2023/06/17 08:18:02 - mmengine - INFO - Epoch(val) [24][2500/2500]    coco/bbox_mAP: 0.4580  coco/bbox_mAP_50: 0.6380  coco/bbox_mAP_75: 0.4950  coco/bbox_mAP_s: 0.2500  coco/bbox_mAP_m: 0.5050  coco/bbox_mAP_l: 0.6520  data_time: 0.0021  time: 0.0323
...
06/19 12:47:44 - mmengine - INFO - Epoch(val) [72][2500/2500]    coco/bbox_mAP: 0.4900  coco/bbox_mAP_50: 0.6750  coco/bbox_mAP_75: 0.5270  coco/bbox_mAP_s: 0.3030  coco/bbox_mAP_m: 0.5380  coco/bbox_mAP_l: 0.6780  data_time: 0.0017  time: 0.0317

I'm still trying to figure out this.

You can refer to the rt-detr reproduced in yolov8, maybe you will find something new.

@nijkah
Copy link
Contributor Author

nijkah commented Jun 23, 2023

Unfortunately, I think it seems that the maintainers of yolov8 did not reproduce the performance in the paper.
Default hyperparameters are way different from the paper.

from ultralytics import RTDETR

model = RTDETR()
model.info() # display model information
model.train(data="coco.yaml") # train
model.predict("path/to/image.jpg") # predict

log

rt-detr-l summary: 673 layers, 32970476 parameters, 32970476 gradients
Ultralytics YOLOv8.0.120 🚀 Python-3.8.5 torch-1.9.1+cu111 CUDA:0
yolo/engine/trainer: task=detect, mode=train, model=None, data=coco.yaml, epochs=100, patience=50, 
batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=None, workers=8, project=None, 
name=None, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=False, 
single_cls=False, rect=False, cos_lr=False, close_mosaic=0, resume=False, amp=True, fraction=1.0, profile=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, 
conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, show=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, vid_stride=1, line_width=None, 
visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, boxes=True, 
format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, 
warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, 
nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, cfg=None, v5loader=False, tracker=botsort.yaml, 

Now, I'm going to compare it with one in ppdet module by module.

@hhaAndroid
Copy link
Collaborator

@nijkah
Copy link
Contributor Author

nijkah commented Jul 4, 2023

https://github.com/lyuwenyu/RT-DETR @nijkah

@hhaAndroid I came back from holidays. Sorry for the late progress.

It seems that the repository only provides the inference code.
I still think the performance difference is based on the training part.
The prediction output is almost same between the migrated model and the original one as below;

image
[Left: Demo from MMDet, Right: Demo from PaddleDet]

The training loss seems quite different between the ported one and the original one, especially in loss_class.

# mmdet 8bs x 2GPU
07/04 08:06:25 - mmengine - INFO - Epoch(train)  [1][  50/7393]  base_lr: 2.5488e-06 lr: 2.5488e-06  eta: 3 days, 13:53:13  time: 0.5809  data_time: 0.0409  memory: 14765  grad_norm: 25.0602  loss: 44.3442  loss_cls: 0.2494  loss_bbox: 1.5884  loss_iou: 1.8367  d0.loss_cls: 0.2357  d0.loss_bbox: 1.6084  d0.loss_iou: 1.8593  d1.loss_cls: 0.2374  d1.loss_bbox: 1.6023  d1.loss_iou: 1.8487  d2.loss_cls: 0.2411  d2.loss_bbox: 1.5994  d2.loss_iou: 1.8468  d3.loss_cls: 0.2436  d3.loss_bbox: 1.5953  d3.loss_iou: 1.8439  d4.loss_cls: 0.2500  d4.loss_bbox: 1.5910  d4.loss_iou: 1.8398  enc_loss_cls: 0.2332  enc_loss_bbox: 1.6214  enc_loss_iou: 1.8650  dn_loss_cls: 0.8697  dn_loss_bbox: 0.9152  dn_loss_iou: 1.2991  d0.dn_loss_cls: 0.8708  d0.dn_loss_bbox: 0.9153  d0.dn_loss_iou: 1.2993  d1.dn_loss_cls: 0.8600  d1.dn_loss_bbox: 0.9153  d1.dn_loss_iou: 1.2992  d2.dn_loss_cls: 0.8680  d2.dn_loss_bbox: 0.9153  d2.dn_loss_iou: 1.2992  d3.dn_loss_cls: 0.8730  d3.dn_loss_bbox: 0.9152  d3.dn_loss_iou: 1.2992  d4.dn_loss_cls: 0.8789  d4.dn_loss_bbox: 0.9152  d4.dn_loss_iou: 1.2992
07/04 08:06:52 - mmengine - INFO - Epoch(train)  [1][ 100/7393]  base_lr: 5.0475e-06 lr: 5.0475e-06  eta: 3 days, 12:12:11  time: 0.5582  data_time: 0.0150  memory: 14765  grad_norm: 26.7655  loss: 43.8603  loss_cls: 0.2484  loss_bbox: 1.5579  loss_iou: 1.8946  d0.loss_cls: 0.2508  d0.loss_bbox: 1.5876  d0.loss_iou: 1.9174  d1.loss_cls: 0.2440  d1.loss_bbox: 1.5792  d1.loss_iou: 1.9120  d2.loss_cls: 0.2436  d2.loss_bbox: 1.5732  d2.loss_iou: 1.9078  d3.loss_cls: 0.2440  d3.loss_bbox: 1.5698  d3.loss_iou: 1.9029  d4.loss_cls: 0.2459  d4.loss_bbox: 1.5625  d4.loss_iou: 1.9000  enc_loss_cls: 0.2519  enc_loss_bbox: 1.5984  enc_loss_iou: 1.9295  dn_loss_cls: 0.7495  dn_loss_bbox: 0.8637  dn_loss_iou: 1.3038  d0.dn_loss_cls: 0.8425  d0.dn_loss_bbox: 0.8633  d0.dn_loss_iou: 1.3045  d1.dn_loss_cls: 0.8067  d1.dn_loss_bbox: 0.8633  d1.dn_loss_iou: 1.3044  d2.dn_loss_cls: 0.7926  d2.dn_loss_bbox: 0.8634  d2.dn_loss_iou: 1.3042  d3.dn_loss_cls: 0.7785  d3.dn_loss_bbox: 0.8635  d3.dn_loss_iou: 1.3041  d4.dn_loss_cls: 0.7633  d4.dn_loss_bbox: 0.8636  d4.dn_loss_iou: 1.3039
07/04 08:07:20 - mmengine - INFO - Epoch(train)  [1][ 150/7393]  base_lr: 7.5463e-06 lr: 7.5463e-06  eta: 3 days, 11:36:14  time: 0.5576  data_time: 0.0148  memory: 14765  grad_norm: 24.9964  loss: 44.6943  loss_cls: 0.2648  loss_bbox: 1.6522  loss_iou: 1.9208  d0.loss_cls: 0.2474  d0.loss_bbox: 1.6940  d0.loss_iou: 1.9675  d1.loss_cls: 0.2439  d1.loss_bbox: 1.6920  d1.loss_iou: 1.9536  d2.loss_cls: 0.2485  d2.loss_bbox: 1.6766  d2.loss_iou: 1.9466  d3.loss_cls: 0.2559  d3.loss_bbox: 1.6665  d3.loss_iou: 1.9365  d4.loss_cls: 0.2575  d4.loss_bbox: 1.6597  d4.loss_iou: 1.9299  enc_loss_cls: 0.2494  enc_loss_bbox: 1.7102  enc_loss_iou: 1.9786  dn_loss_cls: 0.6944  dn_loss_bbox: 0.8987  dn_loss_iou: 1.2998  d0.dn_loss_cls: 0.8105  d0.dn_loss_bbox: 0.8977  d0.dn_loss_iou: 1.3003  d1.dn_loss_cls: 0.7438  d1.dn_loss_bbox: 0.8976  d1.dn_loss_iou: 1.3000  d2.dn_loss_cls: 0.7134  d2.dn_loss_bbox: 0.8977  d2.dn_loss_iou: 1.2998  d3.dn_loss_cls: 0.7034  d3.dn_loss_bbox: 0.8979  d3.dn_loss_iou: 1.2997  d4.dn_loss_cls: 0.6896  d4.dn_loss_bbox: 0.8982  d4.dn_loss_iou: 1.2996
07/04 08:07:49 - mmengine - INFO - Epoch(train)  [1][ 200/7393]  base_lr: 1.0045e-05 lr: 1.0045e-05  eta: 3 days, 11:41:22  time: 0.5681  data_time: 0.0172  memory: 14765  grad_norm: 31.5467  loss: 43.6378  loss_cls: 0.2767  loss_bbox: 1.5410  loss_iou: 1.8848  d0.loss_cls: 0.2576  d0.loss_bbox: 1.6095  d0.loss_iou: 1.9528  d1.loss_cls: 0.2533  d1.loss_bbox: 1.5953  d1.loss_iou: 1.9377  d2.loss_cls: 0.2638  d2.loss_bbox: 1.5764  d2.loss_iou: 1.9223  d3.loss_cls: 0.2662  d3.loss_bbox: 1.5630  d3.loss_iou: 1.9058  d4.loss_cls: 0.2720  d4.loss_bbox: 1.5519  d4.loss_iou: 1.8941  enc_loss_cls: 0.2644  enc_loss_bbox: 1.6327  enc_loss_iou: 1.9707  dn_loss_cls: 0.6153  dn_loss_bbox: 0.8752  dn_loss_iou: 1.3542  d0.dn_loss_cls: 0.7631  d0.dn_loss_bbox: 0.8617  d0.dn_loss_iou: 1.3491  d1.dn_loss_cls: 0.6712  d1.dn_loss_bbox: 0.8626  d1.dn_loss_iou: 1.3489  d2.dn_loss_cls: 0.6450  d2.dn_loss_bbox: 0.8645  d2.dn_loss_iou: 1.3492  d3.dn_loss_cls: 0.6290  d3.dn_loss_bbox: 0.8674  d3.dn_loss_iou: 1.3502  d4.dn_loss_cls: 0.6166  d4.dn_loss_bbox: 0.8711  d4.dn_loss_iou: 1.3518
07/04 08:08:17 - mmengine - INFO - Epoch(train)  [1][ 250/7393]  base_lr: 1.2544e-05 lr: 1.2544e-05  eta: 3 days, 11:21:57  time: 0.5555  data_time: 0.0152  memory: 14765  grad_norm: 44.9920  loss: 42.1280  loss_cls: 0.3194  loss_bbox: 1.4485  loss_iou: 1.7340  d0.loss_cls: 0.2736  d0.loss_bbox: 1.5596  d0.loss_iou: 1.8383  d1.loss_cls: 0.2846  d1.loss_bbox: 1.5293  d1.loss_iou: 1.8090  d2.loss_cls: 0.2936  d2.loss_bbox: 1.5019  d2.loss_iou: 1.7792  d3.loss_cls: 0.3022  d3.loss_bbox: 1.4815  d3.loss_iou: 1.7609  d4.loss_cls: 0.3108  d4.loss_bbox: 1.4662  d4.loss_iou: 1.7443  enc_loss_cls: 0.2809  enc_loss_bbox: 1.5873  enc_loss_iou: 1.8681  dn_loss_cls: 0.5758  dn_loss_bbox: 0.9277  dn_loss_iou: 1.3218  d0.dn_loss_cls: 0.7110  d0.dn_loss_bbox: 0.8735  d0.dn_loss_iou: 1.3049  d1.dn_loss_cls: 0.6347  d1.dn_loss_bbox: 0.8782  d1.dn_loss_iou: 1.3047  d2.dn_loss_cls: 0.6116  d2.dn_loss_bbox: 0.8877  d2.dn_loss_iou: 1.3067  d3.dn_loss_cls: 0.5948  d3.dn_loss_bbox: 0.8998  d3.dn_loss_iou: 1.3106  d4.dn_loss_cls: 0.5818  d4.dn_loss_bbox: 0.9138  d4.dn_loss_iou: 1.3159
07/04 08:08:44 - mmengine - INFO - Epoch(train)  [1][ 300/7393]  base_lr: 1.5043e-05 lr: 1.5043e-05  eta: 3 days, 10:57:54  time: 0.5481  data_time: 0.0157  memory: 14765  grad_norm: 60.9143  loss: 42.8167  loss_cls: 0.3396  loss_bbox: 1.4390  loss_iou: 1.7847  d0.loss_cls: 0.2928  d0.loss_bbox: 1.5773  d0.loss_iou: 1.9057  d1.loss_cls: 0.3062  d1.loss_bbox: 1.5216  d1.loss_iou: 1.8683  d2.loss_cls: 0.3159  d2.loss_bbox: 1.4885  d2.loss_iou: 1.8307  d3.loss_cls: 0.3284  d3.loss_bbox: 1.4630  d3.loss_iou: 1.8087  d4.loss_cls: 0.3335  d4.loss_bbox: 1.4500  d4.loss_iou: 1.7958  enc_loss_cls: 0.2918  enc_loss_bbox: 1.6272  enc_loss_iou: 1.9487  dn_loss_cls: 0.5238  dn_loss_bbox: 0.9650  dn_loss_iou: 1.3762  d0.dn_loss_cls: 0.6764  d0.dn_loss_bbox: 0.8680  d0.dn_loss_iou: 1.3426  d1.dn_loss_cls: 0.6009  d1.dn_loss_bbox: 0.8842  d1.dn_loss_iou: 1.3468  d2.dn_loss_cls: 0.5667  d2.dn_loss_bbox: 0.9087  d2.dn_loss_iou: 1.3543  d3.dn_loss_cls: 0.5454  d3.dn_loss_bbox: 0.9308  d3.dn_loss_iou: 1.3615  d4.dn_loss_cls: 0.5290  d4.dn_loss_bbox: 0.9492  d4.dn_loss_iou: 1.3695
07/04 08:09:11 - mmengine - INFO - Epoch(train)  [1][ 350/7393]  base_lr: 1.7541e-05 lr: 1.7541e-05  eta: 3 days, 10:43:19  time: 0.5503  data_time: 0.0154  memory: 14765  grad_norm: 61.2643  loss: 40.4907  loss_cls: 0.3642  loss_bbox: 1.2941  loss_iou: 1.5588  d0.loss_cls: 0.3258  d0.loss_bbox: 1.3989  d0.loss_iou: 1.6473  d1.loss_cls: 0.3411  d1.loss_bbox: 1.3493  d1.loss_iou: 1.6065  d2.loss_cls: 0.3508  d2.loss_bbox: 1.3193  d2.loss_iou: 1.5840  d3.loss_cls: 0.3579  d3.loss_bbox: 1.3090  d3.loss_iou: 1.5727  d4.loss_cls: 0.3606  d4.loss_bbox: 1.2991  d4.loss_iou: 1.5673  enc_loss_cls: 0.3007  enc_loss_bbox: 1.4747  enc_loss_iou: 1.7077  dn_loss_cls: 0.5109  dn_loss_bbox: 1.0116  dn_loss_iou: 1.3840  d0.dn_loss_cls: 0.6636  d0.dn_loss_bbox: 0.9152  d0.dn_loss_iou: 1.3475  d1.dn_loss_cls: 0.5838  d1.dn_loss_bbox: 0.9456  d1.dn_loss_iou: 1.3578  d2.dn_loss_cls: 0.5458  d2.dn_loss_bbox: 0.9740  d2.dn_loss_iou: 1.3692  d3.dn_loss_cls: 0.5266  d3.dn_loss_bbox: 0.9921  d3.dn_loss_iou: 1.3761  d4.dn_loss_cls: 0.5128  d4.dn_loss_bbox: 1.0034  d4.dn_loss_iou: 1.3807
07/04 08:09:39 - mmengine - INFO - Epoch(train)  [1][ 400/7393]  base_lr: 2.0040e-05 lr: 2.0040e-05  eta: 3 days, 10:31:04  time: 0.5492  data_time: 0.0151  memory: 14765  grad_norm: 33.5409  loss: 40.0292  loss_cls: 0.4119  loss_bbox: 1.2533  loss_iou: 1.6856  d0.loss_cls: 0.4111  d0.loss_bbox: 1.3096  d0.loss_iou: 1.7348  d1.loss_cls: 0.4076  d1.loss_bbox: 1.2832  d1.loss_iou: 1.7126  d2.loss_cls: 0.4102  d2.loss_bbox: 1.2721  d2.loss_iou: 1.7005  d3.loss_cls: 0.4077  d3.loss_bbox: 1.2651  d3.loss_iou: 1.6928  d4.loss_cls: 0.4102  d4.loss_bbox: 1.2612  d4.loss_iou: 1.6903  enc_loss_cls: 0.4030  enc_loss_bbox: 1.3718  enc_loss_iou: 1.7791  dn_loss_cls: 0.4826  dn_loss_bbox: 0.8943  dn_loss_iou: 1.3057  d0.dn_loss_cls: 0.6012  d0.dn_loss_bbox: 0.8433  d0.dn_loss_iou: 1.2954  d1.dn_loss_cls: 0.5268  d1.dn_loss_bbox: 0.8680  d1.dn_loss_iou: 1.3004  d2.dn_loss_cls: 0.4945  d2.dn_loss_bbox: 0.8828  d2.dn_loss_iou: 1.3042  d3.dn_loss_cls: 0.4828  d3.dn_loss_bbox: 0.8900  d3.dn_loss_iou: 1.3055  d4.dn_loss_cls: 0.4786  d4.dn_loss_bbox: 0.8933  d4.dn_loss_iou: 1.3059
07/04 08:10:06 - mmengine - INFO - Epoch(train)  [1][ 450/7393]  base_lr: 2.2539e-05 lr: 2.2539e-05  eta: 3 days, 10:23:02  time: 0.5508  data_time: 0.0157  memory: 14765  grad_norm: 35.2407  loss: 38.5235  loss_cls: 0.4331  loss_bbox: 1.1496  loss_iou: 1.6211  d0.loss_cls: 0.4369  d0.loss_bbox: 1.1949  d0.loss_iou: 1.6464  d1.loss_cls: 0.4298  d1.loss_bbox: 1.1862  d1.loss_iou: 1.6364  d2.loss_cls: 0.4318  d2.loss_bbox: 1.1758  d2.loss_iou: 1.6313  d3.loss_cls: 0.4312  d3.loss_bbox: 1.1659  d3.loss_iou: 1.6291  d4.loss_cls: 0.4283  d4.loss_bbox: 1.1588  d4.loss_iou: 1.6282  enc_loss_cls: 0.4427  enc_loss_bbox: 1.2188  enc_loss_iou: 1.6747  dn_loss_cls: 0.4748  dn_loss_bbox: 0.8346  dn_loss_iou: 1.2997  d0.dn_loss_cls: 0.5773  d0.dn_loss_bbox: 0.8328  d0.dn_loss_iou: 1.2894  d1.dn_loss_cls: 0.5104  d1.dn_loss_bbox: 0.8386  d1.dn_loss_iou: 1.2899  d2.dn_loss_cls: 0.4853  d2.dn_loss_bbox: 0.8405  d2.dn_loss_iou: 1.2908  d3.dn_loss_cls: 0.4742  d3.dn_loss_bbox: 0.8398  d3.dn_loss_iou: 1.2923  d4.dn_loss_cls: 0.4703  d4.dn_loss_bbox: 0.8376  d4.dn_loss_iou: 1.2944
07/04 08:10:34 - mmengine - INFO - Epoch(train)  [1][ 500/7393]  base_lr: 2.5038e-05 lr: 2.5038e-05  eta: 3 days, 10:10:42  time: 0.5443  data_time: 0.0147  memory: 14765  grad_norm: 37.2461  loss: 38.9175  loss_cls: 0.4576  loss_bbox: 1.1115  loss_iou: 1.5468  d0.loss_cls: 0.4658  d0.loss_bbox: 1.1688  d0.loss_iou: 1.5733  d1.loss_cls: 0.4631  d1.loss_bbox: 1.1504  d1.loss_iou: 1.5595  d2.loss_cls: 0.4593  d2.loss_bbox: 1.1382  d2.loss_iou: 1.5545  d3.loss_cls: 0.4599  d3.loss_bbox: 1.1282  d3.loss_iou: 1.5493  d4.loss_cls: 0.4554  d4.loss_bbox: 1.1210  d4.loss_iou: 1.5480  enc_loss_cls: 0.4715  enc_loss_bbox: 1.1972  enc_loss_iou: 1.5967  dn_loss_cls: 0.4749  dn_loss_bbox: 0.9303  dn_loss_iou: 1.3639  d0.dn_loss_cls: 0.5827  d0.dn_loss_bbox: 0.9371  d0.dn_loss_iou: 1.3517  d1.dn_loss_cls: 0.5124  d1.dn_loss_bbox: 0.9391  d1.dn_loss_iou: 1.3514  d2.dn_loss_cls: 0.4858  d2.dn_loss_bbox: 0.9377  d2.dn_loss_iou: 1.3518  d3.dn_loss_cls: 0.4741  d3.dn_loss_bbox: 0.9345  d3.dn_loss_iou: 1.3539  d4.dn_loss_cls: 0.4702  d4.dn_loss_bbox: 0.9320  d4.dn_loss_iou: 1.3578
07/04 08:11:01 - mmengine - INFO - Epoch(train)  [1][ 550/7393]  base_lr: 2.7536e-05 lr: 2.7536e-05  eta: 3 days, 10:00:12  time: 0.5439  data_time: 0.0147  memory: 14758  grad_norm: 36.4342  loss: 36.6310  loss_cls: 0.4611  loss_bbox: 1.0025  loss_iou: 1.4896  d0.loss_cls: 0.4715  d0.loss_bbox: 1.0635  d0.loss_iou: 1.5199  d1.loss_cls: 0.4629  d1.loss_bbox: 1.0497  d1.loss_iou: 1.5036  d2.loss_cls: 0.4585  d2.loss_bbox: 1.0311  d2.loss_iou: 1.5001  d3.loss_cls: 0.4554  d3.loss_bbox: 1.0175  d3.loss_iou: 1.4960  d4.loss_cls: 0.4567  d4.loss_bbox: 1.0077  d4.loss_iou: 1.4940  enc_loss_cls: 0.4967  enc_loss_bbox: 1.0838  enc_loss_iou: 1.5300  dn_loss_cls: 0.4302  dn_loss_bbox: 0.8524  dn_loss_iou: 1.2964  d0.dn_loss_cls: 0.5308  d0.dn_loss_bbox: 0.8607  d0.dn_loss_iou: 1.2806  d1.dn_loss_cls: 0.4662  d1.dn_loss_bbox: 0.8604  d1.dn_loss_iou: 1.2808  d2.dn_loss_cls: 0.4389  d2.dn_loss_bbox: 0.8570  d2.dn_loss_iou: 1.2840  d3.dn_loss_cls: 0.4270  d3.dn_loss_bbox: 0.8542  d3.dn_loss_iou: 1.2887  d4.dn_loss_cls: 0.4255  d4.dn_loss_bbox: 0.8530  d4.dn_loss_iou: 1.2925
07/04 08:11:28 - mmengine - INFO - Epoch(train)  [1][ 600/7393]  base_lr: 3.0035e-05 lr: 3.0035e-05  eta: 3 days, 9:55:35  time: 0.5495  data_time: 0.0150  memory: 14765  grad_norm: 45.7821  loss: 35.4919  loss_cls: 0.4403  loss_bbox: 0.9375  loss_iou: 1.5234  d0.loss_cls: 0.4348  d0.loss_bbox: 1.0059  d0.loss_iou: 1.5596  d1.loss_cls: 0.4300  d1.loss_bbox: 0.9801  d1.loss_iou: 1.5467  d2.loss_cls: 0.4333  d2.loss_bbox: 0.9603  d2.loss_iou: 1.5374  d3.loss_cls: 0.4344  d3.loss_bbox: 0.9509  d3.loss_iou: 1.5301  d4.loss_cls: 0.4367  d4.loss_bbox: 0.9454  d4.loss_iou: 1.5264  enc_loss_cls: 0.4499  enc_loss_bbox: 1.0371  enc_loss_iou: 1.5808  dn_loss_cls: 0.3975  dn_loss_bbox: 0.7696  dn_loss_iou: 1.2939  d0.dn_loss_cls: 0.4900  d0.dn_loss_bbox: 0.7727  d0.dn_loss_iou: 1.2678  d1.dn_loss_cls: 0.4287  d1.dn_loss_bbox: 0.7711  d1.dn_loss_iou: 1.2699  d2.dn_loss_cls: 0.4052  d2.dn_loss_bbox: 0.7693  d2.dn_loss_iou: 1.2763  d3.dn_loss_cls: 0.3947  d3.dn_loss_bbox: 0.7690  d3.dn_loss_iou: 1.2838  d4.dn_loss_cls: 0.3937  d4.dn_loss_bbox: 0.7690  d4.dn_loss_iou: 1.2887
07/04 08:11:56 - mmengine - INFO - Epoch(train)  [1][ 650/7393]  base_lr: 3.2534e-05 lr: 3.2534e-05  eta: 3 days, 9:52:00  time: 0.5501  data_time: 0.0147  memory: 14765  grad_norm: 41.7852  loss: 36.1583  loss_cls: 0.4479  loss_bbox: 0.8858  loss_iou: 1.4241  d0.loss_cls: 0.4368  d0.loss_bbox: 0.9667  d0.loss_iou: 1.4606  d1.loss_cls: 0.4383  d1.loss_bbox: 0.9283  d1.loss_iou: 1.4457  d2.loss_cls: 0.4392  d2.loss_bbox: 0.9078  d2.loss_iou: 1.4402  d3.loss_cls: 0.4414  d3.loss_bbox: 0.8961  d3.loss_iou: 1.4330  d4.loss_cls: 0.4448  d4.loss_bbox: 0.8903  d4.loss_iou: 1.4293  enc_loss_cls: 0.4472  enc_loss_bbox: 1.0002  enc_loss_iou: 1.4818  dn_loss_cls: 0.4390  dn_loss_bbox: 0.9109  dn_loss_iou: 1.3878  d0.dn_loss_cls: 0.5305  d0.dn_loss_bbox: 0.9128  d0.dn_loss_iou: 1.3656  d1.dn_loss_cls: 0.4638  d1.dn_loss_bbox: 0.9099  d1.dn_loss_iou: 1.3703  d2.dn_loss_cls: 0.4431  d2.dn_loss_bbox: 0.9092  d2.dn_loss_iou: 1.3773  d3.dn_loss_cls: 0.4326  d3.dn_loss_bbox: 0.9097  d3.dn_loss_iou: 1.3823  d4.dn_loss_cls: 0.4329  d4.dn_loss_bbox: 0.9099  d4.dn_loss_iou: 1.3852
07/04 08:12:23 - mmengine - INFO - Epoch(train)  [1][ 700/7393]  base_lr: 3.5033e-05 lr: 3.5033e-05  eta: 3 days, 9:46:36  time: 0.5465  data_time: 0.0154  memory: 14765  grad_norm: 44.8541  loss: 37.7555  loss_cls: 0.5058  loss_bbox: 0.9555  loss_iou: 1.5328  d0.loss_cls: 0.4985  d0.loss_bbox: 1.0394  d0.loss_iou: 1.5688  d1.loss_cls: 0.4966  d1.loss_bbox: 0.9987  d1.loss_iou: 1.5547  d2.loss_cls: 0.5005  d2.loss_bbox: 0.9758  d2.loss_iou: 1.5454  d3.loss_cls: 0.4992  d3.loss_bbox: 0.9658  d3.loss_iou: 1.5414  d4.loss_cls: 0.5030  d4.loss_bbox: 0.9641  d4.loss_iou: 1.5358  enc_loss_cls: 0.5167  enc_loss_bbox: 1.0803  enc_loss_iou: 1.5922  dn_loss_cls: 0.4169  dn_loss_bbox: 0.9093  dn_loss_iou: 1.4012  d0.dn_loss_cls: 0.5026  d0.dn_loss_bbox: 0.9080  d0.dn_loss_iou: 1.3735  d1.dn_loss_cls: 0.4408  d1.dn_loss_bbox: 0.9062  d1.dn_loss_iou: 1.3805  d2.dn_loss_cls: 0.4196  d2.dn_loss_bbox: 0.9070  d2.dn_loss_iou: 1.3888  d3.dn_loss_cls: 0.4107  d3.dn_loss_bbox: 0.9079  d3.dn_loss_iou: 1.3946  d4.dn_loss_cls: 0.4106  d4.dn_loss_bbox: 0.9083  d4.dn_loss_iou: 1.3980
07/04 08:12:50 - mmengine - INFO - Epoch(train)  [1][ 750/7393]  base_lr: 3.7531e-05 lr: 3.7531e-05  eta: 3 days, 9:40:59  time: 0.5451  data_time: 0.0148  memory: 14765  grad_norm: 45.8713  loss: 37.4994  loss_cls: 0.5150  loss_bbox: 0.9276  loss_iou: 1.5517  d0.loss_cls: 0.5015  d0.loss_bbox: 1.0159  d0.loss_iou: 1.5910  d1.loss_cls: 0.5114  d1.loss_bbox: 0.9666  d1.loss_iou: 1.5669  d2.loss_cls: 0.5097  d2.loss_bbox: 0.9458  d2.loss_iou: 1.5598  d3.loss_cls: 0.5093  d3.loss_bbox: 0.9361  d3.loss_iou: 1.5568  d4.loss_cls: 0.5111  d4.loss_bbox: 0.9320  d4.loss_iou: 1.5540  enc_loss_cls: 0.5083  enc_loss_bbox: 1.0715  enc_loss_iou: 1.6195  dn_loss_cls: 0.4075  dn_loss_bbox: 0.8807  dn_loss_iou: 1.4021  d0.dn_loss_cls: 0.4841  d0.dn_loss_bbox: 0.8750  d0.dn_loss_iou: 1.3766  d1.dn_loss_cls: 0.4240  d1.dn_loss_bbox: 0.8738  d1.dn_loss_iou: 1.3849  d2.dn_loss_cls: 0.4053  d2.dn_loss_bbox: 0.8765  d2.dn_loss_iou: 1.3922  d3.dn_loss_cls: 0.3988  d3.dn_loss_bbox: 0.8783  d3.dn_loss_iou: 1.3974  d4.dn_loss_cls: 0.4011  d4.dn_loss_bbox: 0.8793  d4.dn_loss_iou: 1.3999
07/04 08:13:18 - mmengine - INFO - Epoch(train)  [1][ 800/7393]  base_lr: 4.0030e-05 lr: 4.0030e-05  eta: 3 days, 9:33:46  time: 0.5410  data_time: 0.0151  memory: 14765  grad_norm: 45.5470  loss: 36.6604  loss_cls: 0.4964  loss_bbox: 0.9202  loss_iou: 1.5986  d0.loss_cls: 0.4949  d0.loss_bbox: 1.0054  d0.loss_iou: 1.6342  d1.loss_cls: 0.4981  d1.loss_bbox: 0.9539  d1.loss_iou: 1.6156  d2.loss_cls: 0.4967  d2.loss_bbox: 0.9362  d2.loss_iou: 1.6053  d3.loss_cls: 0.4918  d3.loss_bbox: 0.9326  d3.loss_iou: 1.6021  d4.loss_cls: 0.4931  d4.loss_bbox: 0.9266  d4.loss_iou: 1.6014  enc_loss_cls: 0.5155  enc_loss_bbox: 1.0780  enc_loss_iou: 1.6599  dn_loss_cls: 0.3780  dn_loss_bbox: 0.8335  dn_loss_iou: 1.3062  d0.dn_loss_cls: 0.4438  d0.dn_loss_bbox: 0.8213  d0.dn_loss_iou: 1.2897  d1.dn_loss_cls: 0.3920  d1.dn_loss_bbox: 0.8235  d1.dn_loss_iou: 1.2961  d2.dn_loss_cls: 0.3763  d2.dn_loss_bbox: 0.8283  d2.dn_loss_iou: 1.3011  d3.dn_loss_cls: 0.3707  d3.dn_loss_bbox: 0.8302  d3.dn_loss_iou: 1.3037  d4.dn_loss_cls: 0.3725  d4.dn_loss_bbox: 0.8317  d4.dn_loss_iou: 1.3049
# ppdet  8bs x 2GPU
[07/04 07:42:40] ppdet.engine INFO: Epoch: [0] [   0/7329] learning_rate: 0.000000 loss_class: 0.683346 loss_bbox: 1.285977 loss_giou: 0.964165 loss_class_aux: 1.870583 loss_bbox_aux: 7.731508 loss_giou_aux: 5.904184 loss_class_dn: 0.686849 loss_bbox_dn: 0.716623 loss_giou_dn: 0.833695 loss_class_aux_dn: 2.812277 loss_bbox_aux_dn: 3.583115 loss_giou_aux_dn: 4.168473 loss: 31.240795 eta: 21 days, 6:44:52 batch_cost: 3.4844 data_cost: 0.0005 ips: 2.2959 images/s
[07/04 07:43:03] ppdet.engine INFO: Epoch: [0] [  50/7329] learning_rate: 0.000003 loss_class: 0.998480 loss_bbox: 1.382280 loss_giou: 1.748477 loss_class_aux: 1.655656 loss_bbox_aux: 8.407383 loss_giou_aux: 10.586290 loss_class_dn: 1.017919 loss_bbox_dn: 0.802868 loss_giou_dn: 1.311996 loss_class_aux_dn: 4.180416 loss_bbox_aux_dn: 4.014454 loss_giou_aux_dn: 6.561262 loss: 43.388004 eta: 2 days, 21:29:28 batch_cost: 0.4139 data_cost: 0.0004 ips: 19.3273 images/s
[07/04 07:43:25] ppdet.engine INFO: Epoch: [0] [ 100/7329] learning_rate: 0.000005 loss_class: 0.944255 loss_bbox: 1.379791 loss_giou: 1.742733 loss_class_aux: 1.795154 loss_bbox_aux: 8.391826 loss_giou_aux: 10.500675 loss_class_dn: 0.946447 loss_bbox_dn: 0.770575 loss_giou_dn: 1.316061 loss_class_aux_dn: 3.990767 loss_bbox_aux_dn: 3.848480 loss_giou_aux_dn: 6.583784 loss: 41.833054 eta: 2 days, 16:23:24 batch_cost: 0.4039 data_cost: 0.0004 ips: 19.8064 images/s
[07/04 07:43:47] ppdet.engine INFO: Epoch: [0] [ 150/7329] learning_rate: 0.000008 loss_class: 0.865210 loss_bbox: 1.466716 loss_giou: 1.591932 loss_class_aux: 1.756492 loss_bbox_aux: 8.951975 loss_giou_aux: 9.711206 loss_class_dn: 0.842548 loss_bbox_dn: 0.834488 loss_giou_dn: 1.259095 loss_class_aux_dn: 3.579963 loss_bbox_aux_dn: 4.175209 loss_giou_aux_dn: 6.308812 loss: 42.384117 eta: 2 days, 14:14:23 batch_cost: 0.3952 data_cost: 0.0004 ips: 20.2440 images/s
[07/04 07:44:10] ppdet.engine INFO: Epoch: [0] [ 200/7329] learning_rate: 0.000010 loss_class: 0.821579 loss_bbox: 1.468903 loss_giou: 1.514127 loss_class_aux: 1.873052 loss_bbox_aux: 9.161399 loss_giou_aux: 9.294691 loss_class_dn: 0.771384 loss_bbox_dn: 0.882304 loss_giou_dn: 1.211941 loss_class_aux_dn: 3.285707 loss_bbox_aux_dn: 4.335419 loss_giou_aux_dn: 6.048694 loss: 40.782269 eta: 2 days, 13:46:32 batch_cost: 0.4122 data_cost: 0.0004 ips: 19.4101 images/s
[07/04 07:44:33] ppdet.engine INFO: Epoch: [0] [ 250/7329] learning_rate: 0.000013 loss_class: 0.827880 loss_bbox: 1.381951 loss_giou: 1.455075 loss_class_aux: 2.192601 loss_bbox_aux: 8.752991 loss_giou_aux: 9.094400 loss_class_dn: 0.770761 loss_bbox_dn: 0.938643 loss_giou_dn: 1.203800 loss_class_aux_dn: 3.360187 loss_bbox_aux_dn: 4.527280 loss_giou_aux_dn: 6.029949 loss: 41.134167 eta: 2 days, 13:39:11 batch_cost: 0.4176 data_cost: 0.0004 ips: 19.1563 images/s
[07/04 07:44:54] ppdet.engine INFO: Epoch: [0] [ 300/7329] learning_rate: 0.000015 loss_class: 0.878376 loss_bbox: 1.304542 loss_giou: 1.526390 loss_class_aux: 2.675135 loss_bbox_aux: 8.318256 loss_giou_aux: 9.266665 loss_class_dn: 0.797022 loss_bbox_dn: 0.914100 loss_giou_dn: 1.320687 loss_class_aux_dn: 3.393513 loss_bbox_aux_dn: 4.405071 loss_giou_aux_dn: 6.539315 loss: 43.072666 eta: 2 days, 12:48:05 batch_cost: 0.3861 data_cost: 0.0004 ips: 20.7226 images/s
[07/04 07:45:16] ppdet.engine INFO: Epoch: [0] [ 350/7329] learning_rate: 0.000018 loss_class: 0.894463 loss_bbox: 1.256612 loss_giou: 1.769405 loss_class_aux: 2.898498 loss_bbox_aux: 8.179535 loss_giou_aux: 10.855463 loss_class_dn: 0.813604 loss_bbox_dn: 0.898470 loss_giou_dn: 1.532507 loss_class_aux_dn: 3.429995 loss_bbox_aux_dn: 4.291843 loss_giou_aux_dn: 7.670493 loss: 44.766773 eta: 2 days, 12:33:08 batch_cost: 0.4034 data_cost: 0.0004 ips: 19.8330 images/s
[07/04 07:45:38] ppdet.engine INFO: Epoch: [0] [ 400/7329] learning_rate: 0.000020 loss_class: 0.790304 loss_bbox: 1.028785 loss_giou: 1.359278 loss_class_aux: 3.110061 loss_bbox_aux: 6.837068 loss_giou_aux: 8.406872 loss_class_dn: 0.694199 loss_bbox_dn: 0.847215 loss_giou_dn: 1.303349 loss_class_aux_dn: 3.033953 loss_bbox_aux_dn: 4.142806 loss_giou_aux_dn: 6.541418 loss: 38.661110 eta: 2 days, 12:10:25 batch_cost: 0.3930 data_cost: 0.0004 ips: 20.3581 images/s
[07/04 07:45:59] ppdet.engine INFO: Epoch: [0] [ 450/7329] learning_rate: 0.000023 loss_class: 0.831289 loss_bbox: 0.900885 loss_giou: 1.274977 loss_class_aux: 3.937345 loss_bbox_aux: 6.190751 loss_giou_aux: 8.041082 loss_class_dn: 0.692314 loss_bbox_dn: 0.861287 loss_giou_dn: 1.369993 loss_class_aux_dn: 3.098099 loss_bbox_aux_dn: 4.358569 loss_giou_aux_dn: 6.602709 loss: 38.432137 eta: 2 days, 11:41:49 batch_cost: 0.3818 data_cost: 0.0004 ips: 20.9528 images/s
[07/04 07:46:20] ppdet.engine INFO: Epoch: [0] [ 500/7329] learning_rate: 0.000025 loss_class: 0.768482 loss_bbox: 0.760391 loss_giou: 1.233569 loss_class_aux: 3.949719 loss_bbox_aux: 4.985241 loss_giou_aux: 7.611237 loss_class_dn: 0.649078 loss_bbox_dn: 0.782088 loss_giou_dn: 1.294421 loss_class_aux_dn: 2.896272 loss_bbox_aux_dn: 3.969111 loss_giou_aux_dn: 6.413960 loss: 36.220665 eta: 2 days, 11:22:51 batch_cost: 0.3864 data_cost: 0.0004 ips: 20.7046 images/s
[07/04 07:46:42] ppdet.engine INFO: Epoch: [0] [ 550/7329] learning_rate: 0.000028 loss_class: 0.728091 loss_bbox: 0.776833 loss_giou: 1.123835 loss_class_aux: 3.761339 loss_bbox_aux: 5.018996 loss_giou_aux: 7.186420 loss_class_dn: 0.616948 loss_bbox_dn: 0.788702 loss_giou_dn: 1.239618 loss_class_aux_dn: 2.915491 loss_bbox_aux_dn: 3.992010 loss_giou_aux_dn: 6.192928 loss: 36.567657 eta: 2 days, 11:14:37 batch_cost: 0.3956 data_cost: 0.0004 ips: 20.2233 images/s
[07/04 07:47:04] ppdet.engine INFO: Epoch: [0] [ 600/7329] learning_rate: 0.000030 loss_class: 0.763762 loss_bbox: 0.656370 loss_giou: 1.219659 loss_class_aux: 4.180519 loss_bbox_aux: 4.357739 loss_giou_aux: 7.853674 loss_class_dn: 0.616509 loss_bbox_dn: 0.794917 loss_giou_dn: 1.358421 loss_class_aux_dn: 2.903438 loss_bbox_aux_dn: 4.027238 loss_giou_aux_dn: 6.720316 loss: 35.873466 eta: 2 days, 11:02:11 batch_cost: 0.3881 data_cost: 0.0004 ips: 20.6158 images/s
[07/04 07:47:25] ppdet.engine INFO: Epoch: [0] [ 650/7329] learning_rate: 0.000033 loss_class: 0.737461 loss_bbox: 0.583091 loss_giou: 1.170799 loss_class_aux: 4.762381 loss_bbox_aux: 3.967538 loss_giou_aux: 7.494648 loss_class_dn: 0.605938 loss_bbox_dn: 0.730950 loss_giou_dn: 1.339651 loss_class_aux_dn: 2.837840 loss_bbox_aux_dn: 3.697802 loss_giou_aux_dn: 6.588319 loss: 35.522499 eta: 2 days, 10:55:36 batch_cost: 0.3939 data_cost: 0.0004 ips: 20.3075 images/s
[07/04 07:47:48] ppdet.engine INFO: Epoch: [0] [ 700/7329] learning_rate: 0.000035 loss_class: 0.731348 loss_bbox: 0.576231 loss_giou: 1.091594 loss_class_aux: 5.038460 loss_bbox_aux: 3.846195 loss_giou_aux: 7.036030 loss_class_dn: 0.584472 loss_bbox_dn: 0.741416 loss_giou_dn: 1.262873 loss_class_aux_dn: 2.908104 loss_bbox_aux_dn: 3.811942 loss_giou_aux_dn: 6.383394 loss: 35.792530 eta: 2 days, 10:59:28 batch_cost: 0.4092 data_cost: 0.0004 ips: 19.5496 images/s
[07/04 07:48:11] ppdet.engine INFO: Epoch: [0] [ 750/7329] learning_rate: 0.000038 loss_class: 0.719343 loss_bbox: 0.521474 loss_giou: 0.999734 loss_class_aux: 5.578853 loss_bbox_aux: 3.550250 loss_giou_aux: 6.466695 loss_class_dn: 0.550823 loss_bbox_dn: 0.694916 loss_giou_dn: 1.221618 loss_class_aux_dn: 2.914670 loss_bbox_aux_dn: 3.519604 loss_giou_aux_dn: 6.196606 loss: 33.507900 eta: 2 days, 11:06:23 batch_cost: 0.4154 data_cost: 0.0004 ips: 19.2591 images/s
[07/04 07:48:33] ppdet.engine INFO: Epoch: [0] [ 800/7329] learning_rate: 0.000040 loss_class: 0.676013 loss_bbox: 0.478862 loss_giou: 1.018566 loss_class_aux: 4.934841 loss_bbox_aux: 3.229126 loss_giou_aux: 6.532661 loss_class_dn: 0.514488 loss_bbox_dn: 0.692227 loss_giou_dn: 1.252319 loss_class_aux_dn: 2.825400 loss_bbox_aux_dn: 3.580340 loss_giou_aux_dn: 6.327948 loss: 31.801727 eta: 2 days, 11:08:38 batch_cost: 0.4085 data_cost: 0.0004 ips: 19.5827 images/s

@nijkah
Copy link
Contributor Author

nijkah commented Jul 4, 2023

Above log from ppdet has an issue. PaddlePaddle/PaddleDetection#8409
I used the latest commit and it has a bug to set iou_score.
After fixing, I got below logs from ppdet.

# ppdet  8bs x 2GPU
[07/04 15:02:50] ppdet.engine INFO: Epoch: [0] [   0/7329] learning_rate: 0.000000 loss_class: 0.201641 loss_bbox: 1.727503 loss_giou: 1.307491 loss_class_aux: 1.212130 loss_bbox_aux: 10.493377 loss_giou_aux: 7.949263 loss_class_dn: 0.635403 loss_bbox_dn: 0.939284 loss_giou_dn: 1.015846 loss_class_aux_dn: 3.224553 loss_bbox_aux_dn: 4.696418 loss_giou_aux_dn: 5.079230 loss: 38.482140 eta: 14 days, 23:48:57 batch_cost: 2.4547 data_cost: 0.0005 ips: 3.2590 images/s
[07/04 15:03:13] ppdet.engine INFO: Epoch: [0] [  50/7329] learning_rate: 0.000003 loss_class: 0.297633 loss_bbox: 1.479268 loss_giou: 1.815598 loss_class_aux: 1.675295 loss_bbox_aux: 9.019381 loss_giou_aux: 10.936712 loss_class_dn: 0.912077 loss_bbox_dn: 0.838664 loss_giou_dn: 1.344816 loss_class_aux_dn: 4.453175 loss_bbox_aux_dn: 4.192844 loss_giou_aux_dn: 6.724151 loss: 46.462036 eta: 2 days, 17:45:22 batch_cost: 0.4085 data_cost: 0.0005 ips: 19.5827 images/s
[07/04 15:03:35] ppdet.engine INFO: Epoch: [0] [ 100/7329] learning_rate: 0.000005 loss_class: 0.285327 loss_bbox: 1.477210 loss_giou: 1.603661 loss_class_aux: 1.596411 loss_bbox_aux: 8.960684 loss_giou_aux: 9.666679 loss_class_dn: 0.779881 loss_bbox_dn: 0.856219 loss_giou_dn: 1.201171 loss_class_aux_dn: 3.744936 loss_bbox_aux_dn: 4.275051 loss_giou_aux_dn: 6.006138 loss: 42.018841 eta: 2 days, 14:01:19 batch_cost: 0.3973 data_cost: 0.0005 ips: 20.1378 images/s
[07/04 15:03:57] ppdet.engine INFO: Epoch: [0] [ 150/7329] learning_rate: 0.000008 loss_class: 0.275387 loss_bbox: 1.372201 loss_giou: 1.629953 loss_class_aux: 1.592410 loss_bbox_aux: 8.422577 loss_giou_aux: 9.954062 loss_class_dn: 0.749548 loss_bbox_dn: 0.786661 loss_giou_dn: 1.268025 loss_class_aux_dn: 3.732531 loss_bbox_aux_dn: 3.917260 loss_giou_aux_dn: 6.346529 loss: 41.343445 eta: 2 days, 12:48:22 batch_cost: 0.3983 data_cost: 0.0005 ips: 20.0870 images/s
[07/04 15:04:19] ppdet.engine INFO: Epoch: [0] [ 200/7329] learning_rate: 0.000010 loss_class: 0.349416 loss_bbox: 1.482662 loss_giou: 1.670851 loss_class_aux: 1.887396 loss_bbox_aux: 9.177547 loss_giou_aux: 10.195324 loss_class_dn: 0.698555 loss_bbox_dn: 0.891267 loss_giou_dn: 1.347793 loss_class_aux_dn: 3.470482 loss_bbox_aux_dn: 4.455425 loss_giou_aux_dn: 6.663803 loss: 41.833828 eta: 2 days, 12:38:50 batch_cost: 0.4108 data_cost: 0.0006 ips: 19.4765 images/s
[07/04 15:04:41] ppdet.engine INFO: Epoch: [0] [ 250/7329] learning_rate: 0.000013 loss_class: 0.381666 loss_bbox: 1.361250 loss_giou: 1.545455 loss_class_aux: 2.009711 loss_bbox_aux: 8.443027 loss_giou_aux: 9.609318 loss_class_dn: 0.675151 loss_bbox_dn: 0.908471 loss_giou_dn: 1.278737 loss_class_aux_dn: 3.423970 loss_bbox_aux_dn: 4.396884 loss_giou_aux_dn: 6.381564 loss: 41.208412 eta: 2 days, 12:07:38 batch_cost: 0.3963 data_cost: 0.0005 ips: 20.1872 images/s
[07/04 15:05:03] ppdet.engine INFO: Epoch: [0] [ 300/7329] learning_rate: 0.000015 loss_class: 0.455796 loss_bbox: 1.304743 loss_giou: 1.480081 loss_class_aux: 2.504373 loss_bbox_aux: 8.138662 loss_giou_aux: 9.187222 loss_class_dn: 0.657482 loss_bbox_dn: 0.949222 loss_giou_dn: 1.317122 loss_class_aux_dn: 3.362846 loss_bbox_aux_dn: 4.543368 loss_giou_aux_dn: 6.522356 loss: 40.966015 eta: 2 days, 11:42:11 batch_cost: 0.3932 data_cost: 0.0005 ips: 20.3464 images/s
[07/04 15:05:26] ppdet.engine INFO: Epoch: [0] [ 350/7329] learning_rate: 0.000018 loss_class: 0.530600 loss_bbox: 1.225484 loss_giou: 1.654275 loss_class_aux: 2.943487 loss_bbox_aux: 7.658451 loss_giou_aux: 10.139181 loss_class_dn: 0.690101 loss_bbox_dn: 0.894718 loss_giou_dn: 1.529423 loss_class_aux_dn: 3.481877 loss_bbox_aux_dn: 4.247900 loss_giou_aux_dn: 7.611682 loss: 44.020210 eta: 2 days, 11:53:08 batch_cost: 0.4166 data_cost: 0.0006 ips: 19.2046 images/s
[07/04 15:05:48] ppdet.engine INFO: Epoch: [0] [ 400/7329] learning_rate: 0.000020 loss_class: 0.546594 loss_bbox: 1.050263 loss_giou: 1.361742 loss_class_aux: 3.038521 loss_bbox_aux: 6.921156 loss_giou_aux: 8.373913 loss_class_dn: 0.622643 loss_bbox_dn: 0.901651 loss_giou_dn: 1.274494 loss_class_aux_dn: 3.126308 loss_bbox_aux_dn: 4.433243 loss_giou_aux_dn: 6.330893 loss: 39.587692 eta: 2 days, 11:49:14 batch_cost: 0.4056 data_cost: 0.0005 ips: 19.7250 images/s
[07/04 15:06:11] ppdet.engine INFO: Epoch: [0] [ 450/7329] learning_rate: 0.000023 loss_class: 0.600202 loss_bbox: 0.908755 loss_giou: 1.338081 loss_class_aux: 3.383973 loss_bbox_aux: 6.074856 loss_giou_aux: 8.237648 loss_class_dn: 0.610507 loss_bbox_dn: 0.791783 loss_giou_dn: 1.329810 loss_class_aux_dn: 3.103679 loss_bbox_aux_dn: 4.026302 loss_giou_aux_dn: 6.615985 loss: 39.349503 eta: 2 days, 11:42:53 batch_cost: 0.4023 data_cost: 0.0005 ips: 19.8881 images/s
[07/04 15:06:34] ppdet.engine INFO: Epoch: [0] [ 500/7329] learning_rate: 0.000025 loss_class: 0.679902 loss_bbox: 0.800746 loss_giou: 1.200678 loss_class_aux: 3.808671 loss_bbox_aux: 5.183709 loss_giou_aux: 7.519331 loss_class_dn: 0.568733 loss_bbox_dn: 0.755130 loss_giou_dn: 1.312346 loss_class_aux_dn: 2.971537 loss_bbox_aux_dn: 3.829945 loss_giou_aux_dn: 6.568553 loss: 37.149818 eta: 2 days, 11:50:23 batch_cost: 0.4167 data_cost: 0.0005 ips: 19.2000 images/s
[07/04 15:06:56] ppdet.engine INFO: Epoch: [0] [ 550/7329] learning_rate: 0.000028 loss_class: 0.751524 loss_bbox: 0.671623 loss_giou: 1.125915 loss_class_aux: 4.166484 loss_bbox_aux: 4.458362 loss_giou_aux: 6.992559 loss_class_dn: 0.560734 loss_bbox_dn: 0.712450 loss_giou_dn: 1.272195 loss_class_aux_dn: 2.772212 loss_bbox_aux_dn: 3.673186 loss_giou_aux_dn: 6.362156 loss: 34.372982 eta: 2 days, 11:49:32 batch_cost: 0.4080 data_cost: 0.0005 ips: 19.6085 images/s
[07/04 15:07:18] ppdet.engine INFO: Epoch: [0] [ 600/7329] learning_rate: 0.000030 loss_class: 0.812774 loss_bbox: 0.652046 loss_giou: 1.190217 loss_class_aux: 4.567040 loss_bbox_aux: 4.310749 loss_giou_aux: 7.383471 loss_class_dn: 0.585980 loss_bbox_dn: 0.812320 loss_giou_dn: 1.316965 loss_class_aux_dn: 2.868565 loss_bbox_aux_dn: 4.059213 loss_giou_aux_dn: 6.592301 loss: 36.259888 eta: 2 days, 11:45:53 batch_cost: 0.4040 data_cost: 0.0005 ips: 19.8000 images/s
[07/04 15:07:40] ppdet.engine INFO: Epoch: [0] [ 650/7329] learning_rate: 0.000033 loss_class: 0.814888 loss_bbox: 0.652022 loss_giou: 1.263505 loss_class_aux: 4.560444 loss_bbox_aux: 4.430242 loss_giou_aux: 8.005415 loss_class_dn: 0.604041 loss_bbox_dn: 0.820365 loss_giou_dn: 1.305081 loss_class_aux_dn: 2.902153 loss_bbox_aux_dn: 4.106123 loss_giou_aux_dn: 6.590990 loss: 35.503151 eta: 2 days, 11:30:47 batch_cost: 0.3863 data_cost: 0.0005 ips: 20.7074 images/s
[07/04 15:08:03] ppdet.engine INFO: Epoch: [0] [ 700/7329] learning_rate: 0.000035 loss_class: 0.857192 loss_bbox: 0.552369 loss_giou: 1.086274 loss_class_aux: 4.628265 loss_bbox_aux: 3.728952 loss_giou_aux: 7.016881 loss_class_dn: 0.619368 loss_bbox_dn: 0.738904 loss_giou_dn: 1.247447 loss_class_aux_dn: 2.904344 loss_bbox_aux_dn: 3.695195 loss_giou_aux_dn: 6.333776 loss: 34.237926 eta: 2 days, 11:34:39 batch_cost: 0.4132 data_cost: 0.0005 ips: 19.3599 images/s
[07/04 15:08:25] ppdet.engine INFO: Epoch: [0] [ 750/7329] learning_rate: 0.000038 loss_class: 0.908259 loss_bbox: 0.549284 loss_giou: 1.055723 loss_class_aux: 5.006126 loss_bbox_aux: 3.712269 loss_giou_aux: 6.599482 loss_class_dn: 0.584488 loss_bbox_dn: 0.741655 loss_giou_dn: 1.197559 loss_class_aux_dn: 2.794070 loss_bbox_aux_dn: 3.833305 loss_giou_aux_dn: 6.021413 loss: 33.135147 eta: 2 days, 11:34:25 batch_cost: 0.4072 data_cost: 0.0005 ips: 19.6480 images/s
[07/04 15:08:47] ppdet.engine INFO: Epoch: [0] [ 800/7329] learning_rate: 0.000040 loss_class: 0.867941 loss_bbox: 0.485300 loss_giou: 1.014525 loss_class_aux: 4.928567 loss_bbox_aux: 3.233761 loss_giou_aux: 6.424962 loss_class_dn: 0.546647 loss_bbox_dn: 0.710170 loss_giou_dn: 1.178975 loss_class_aux_dn: 2.668061 loss_bbox_aux_dn: 3.592177 loss_giou_aux_dn: 5.991127 loss: 30.773624 eta: 2 days, 11:30:30 batch_cost: 0.4005 data_cost: 0.0004 ips: 19.9771 images/s

@hhaAndroid
Copy link
Collaborator

@nijkah So does this mean there is an issue with the official code? Or is it that the official code training is fine, but there are issues with reproducing it in mmdetection? Converting from Paddle to PyTorch is difficult, so if it's too challenging, perhaps we can wait for the official release of the PyTorch code or try training rtdetr in yolov8 to see if we can reproduce it.

@nijkah
Copy link
Contributor Author

nijkah commented Aug 25, 2023

r18vd, 8 bs X 2 gpu

Accumulating evaluation results...
DONE (t=26.18s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.441
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.629
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.480
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.272
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.474
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.593
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.657
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.659
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.659 
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.486
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.700
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.821
08/24 22:53:54 - mmengine - INFO - bbox_mAP_copypaste: 0.441 0.629 0.480 0.272 0.474 0.593
08/24 22:53:56 - mmengine - INFO - Epoch(val) [72][2500/2500]    coco/bbox_mAP: 0.4410  coco/bbox_mAP_50: 0.6290  coco/bbox_mAP_75: 0.4800  coco/bbox_mAP_s: 0.2720  coco/bbox_mAP_m: 0.4740  coco/bbox_mAP_l: 0.5930  data_time: 0.0012  time: 0.0187

mmdet: 44.10 (-2.4)
ppdet: 46.5

The training model with the modification from @rydenisbak with SyncBN showed an unsatisfied performance.
14e bbox_mAP_copypaste: 0.377 0.544 0.406 0.219 0.402 0.522

@hhaAndroid
Copy link
Collaborator

@nijkah I think it's worth spending some more time going through the model section. Because I just found out that the denoising_class_embed module in the CDN isn't actually trained, while rt-detr in mmdet is trained.

        else:
            target = output_memory.gather(dim=1, \
                index=topk_ind.unsqueeze(-1).repeat(1, 1, output_memory.shape[-1]))

        if denoising_class is not None:
            target = torch.concat([denoising_class, target], 1)

        return target.detach(), reference_points_unact.detach(), enc_topk_bboxes, enc_topk_logits

Apart from the model section, I also noticed a few differences: (1) syncbn, and (2) the decay parameter. However, it doesn't seem to have a significant impact.

optimizer:
  type: AdamW
  params: 
    - 
      params: '^(?=.*backbone)(?=.*norm).*$'
      lr: 0.00001
      weight_decay: 0.  # backbone.norm
    - 
      params: '^(?=.*backbone)(?!.*norm).*$'
      lr: 0.00001 # backbone 除了 norm 之外的参数
    - 
      params: '^(?=.*(?:encoder|decoder))(?=.*(?:norm|bias)).*$'
      weight_decay: 0. # 其余层的 norm 和 bias

  lr: 0.0001
  betas: [0.9, 0.999]
  weight_decay: 0.0001

The data augmentation section does have some differences, so if we want to rule out whether it's really the data augmentation causing the impact, we can choose the same weights and samples and run 100 iterations to see if the loss differentiation is consistent. If it's confirmed that data augmentation is indeed the cause, then it might be worth trying to replace it with data augmentation from torchvision. There are still quite a few uncertainties at the moment, so we need to investigate them one by one.

@hhaAndroid
Copy link
Collaborator

@nijkah The author of rt-detr made some modifications in the init query section, taking references from both deformable detr and dino. I'm not sure if it was intentionally set up that way.

@nijkah
Copy link
Contributor Author

nijkah commented Aug 28, 2023

r50vd, 4 bs X 4 gpu

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.516
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.707
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.557
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.334
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.563
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.684
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.705
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.707
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.707
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.550
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.747
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.865
08/25 18:16:31 - mmengine - INFO - bbox_mAP_copypaste: 0.516 0.707 0.557 0.334 0.563 0.684
08/25 18:16:34 - mmengine - INFO - Epoch(val) [72][1250/1250]    coco/bbox_mAP: 0.5160  coco/bbox_mAP_50: 0.7070  coco/bbox_mAP_75: 0.5570  coco/bbox_mAP_s: 0.3340  coco/bbox_mAP_m: 0.5630  coco/bbox_mAP_l: 0.6840  data_time: 0.0013  time: 0.0266

@hhaAndroid
Copy link
Collaborator

@nijkah Hi. Shouldn't we focus on migrating the data augmentation pipeline and confirm if it's the cause of the issue? This part has a significant impact.

@nijkah
Copy link
Contributor Author

nijkah commented Aug 28, 2023

@hhaAndroid Okay. Actually, I couldn't investigate the model section part yet.

I'll follow these steps.

  1. Confirm that the model difference in training affects the performance. (I think the model in test mode is tested enough.)
  2. Migrate Data Augmentation Pipeline one by one.

@rydenisbak
Copy link
Contributor

rydenisbak commented Aug 29, 2023

Hi, my experiment has done. I also faced with metrics drop.
45.8 paddle 72 epoch, 2 card, bs=4 per card, syncBN
44.6 mmdet 72 epoch 1 card, bs=8 per card no syncBN

Interesting that I can't reproduce author's result 46.5 that was claimed in PaddleDetection repo.
Anyway my main target is reproduce PaddleDet.

My augmentation a litte bit different, but not same to paddle, and I got result 44.6 instead of 44.1 like @nijkah.
I think that reproduce data augmentation one by one is right way.

@hhaAndroid
Copy link
Collaborator

@nijkah @rydenisbak I trained the official code r18vd using 4x3090, and the performance of the best model is 46.0.

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.460
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.627
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.498
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.282
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.494
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.622
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.361
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.616
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.686
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.492
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.732
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.862
best_stat:  {'epoch': 69, 'coco_eval_bbox': 0.45973773294988696}

rtdetr_r18vd_dlc1m8i5txcccsq8-master-0_2023-08-30 10_21_35.txt

@feivellau
Copy link

r18vd, 32 bs X 4 gpu

Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.456
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.641
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.495
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.299
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.490
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.606
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.669
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.672
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.672
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.501
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.710
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.835
09/12 18:00:05 - mmengine - INFO - bbox_mAP_copypaste: 0.456 0.641 0.495 0.299 0.490 0.606
09/12 18:00:08 - mmengine - INFO - Epoch(val) [72][40/40]    coco/bbox_mAP: 0.4560  coco/bbox_mAP_50: 0.6410  coco/bbox_mAP_75: 0.4950  coco/bbox_mAP_s: 0.2990  coco/bbox_mAP_m: 0.4900  coco/bbox_mAP_l: 0.6060 

valid_idx = labels < self.cls_out_channels
# assign iou score to the corresponding label
cls_iou_targets[valid_idx,
labels[valid_idx]] = iou_score[valid_idx]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest for amp training
iou_score[valid_idx] -> iou_score[valid_idx].to(cls_iou_targets.dtype)

valid_idx = labels < self.cls_out_channels
# assign iou score to the corresponding label
cls_iou_targets[valid_idx,
labels[valid_idx]] = iou_score[valid_idx]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest for amp training
iou_score[valid_idx] -> iou_score[valid_idx].to(cls_iou_targets.dtype)

@jiesonshan
Copy link

r18, 4x3090 batch_size 16
official pytorch code
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.464
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.634
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.504
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.283
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.496
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.627
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.362
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.616
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.685
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.486
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.729
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.860
mmdetection
no syncbn
ap=0.430
syncbn
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.439
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.607
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.474
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.259
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.470
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.611
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.666
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.669
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.669
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.461
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.713
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.860

The data augmentation section does have some differences and decay parameter
Expand fill 0
muilt-scale processing in DetDataPreprocessor, not in transforms
transforms include: PhotoMetricDistortion,Expand,MinIoURandomCrop,RandomFlip,Resize
optimizer parameter: all bias and norm weight_decay=0
backbone weights and backbone norm weigit lr 1e-5 wd 1e-4
others weights and others norm weight lr 1e-4 wd 1e-4
syncbn
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.464
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.636
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.504
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.290
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.500
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.626
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.686
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.689
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.689
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.505
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.733
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.862
10/12 03:39:59 - mmengine - INFO - bbox_mAP_copypaste: 0.464 0.636 0.504 0.290 0.500 0.626
10/12 03:40:01 - mmengine - INFO - Epoch(val) [72][79/79] coco/bbox_mAP: 0.4640 coco/bbox_mAP_50: 0.6360 coco/bbox_mAP_75: 0.5040 coco/bbox_mAP_s: 0.2900 coco/bbox_mAP_m: 0.5000 coco/bbox_mAP_l: 0.6260 data_time: 0.0218 time: 0.1140

@hhaAndroid

@feivellau
Copy link

The data augmentation section does have some differences and decay parameter
Expand fill 0
muilt-scale processing in DetDataPreprocessor, not in transforms
transforms include: PhotoMetricDistortion,Expand,MinIoURandomCrop,RandomFlip,Resize
optimizer parameter: all bias and norm weight_decay=0
backbone weights and backbone norm weigit lr 1e-5 wd 1e-4
others weights and others norm weight lr 1e-4 wd 1e-4

Can you provide your configuration file, Thank you.

@jiesonshan
Copy link

jiesonshan commented Oct 16, 2023

The data augmentation section does have some differences and decay parameter
Expand fill 0
muilt-scale processing in DetDataPreprocessor, not in transforms
transforms include: PhotoMetricDistortion,Expand,MinIoURandomCrop,RandomFlip,Resize
optimizer parameter: all bias and norm weight_decay=0
backbone weights and backbone norm weigit lr 1e-5 wd 1e-4
others weights and others norm weight lr 1e-4 wd 1e-4

Can you provide your configuration file, Thank you.

log_level = 'INFO'
load_from = None
resume = True
work_dir = './saved/rtdetr_debug'
# dataset settings
dataset_type = 'CocoDataset'
data_root = '/data/coco2017/'

backend_args = None

pretrained = '/data/torch_models/resnet18vd_pretrained.pth'  # noqa

eval_size = (640, 640)
model = dict(
    type='RTDETR',
    num_queries=300,  # num_matching_queries
    with_box_refine=True,
    as_two_stage=True,
    eval_size=eval_size,
    use_syncbn=True,
    data_preprocessor=dict(
        type='DetDataPreprocessor',
        mean=[0, 0, 0],
        std=[255, 255, 255],
        bgr_to_rgb=True,
        pad_size_divisor=32,
        batch_augments=[
            dict(
                type='BatchSyncRandomResizeV1',  # rewrite
                multi_scales=(
                    (480, 480), (512, 512), (544, 544),
                    (576, 576), (608, 608), (640, 640),
                    (640, 640), (640, 640), (672, 672),
                    (704, 704), (736, 736), (768, 768),
                    (800, 800)),
                interval=1)]
        ),
    # backbone=dict(
    #     type='ResNetV1d',
    #     depth=18,
    #     num_stages=4,
    #     out_indices=(1, 2, 3),
    #     frozen_stages=-1,
    #     norm_cfg=dict(type='BN', requires_grad=True),
    #     norm_eval=False,
    #     style='pytorch',
    #     init_cfg=dict(type='Pretrained', checkpoint=pretrained)),
    backbone=dict(
        type='PResNet',
        depth=18,
        freeze_at=-1,
        freeze_norm=False,
        pretrained=True,
        variant='d',
        out_indices=[1, 2, 3],
        num_stages=4,
        ),
    neck=dict(
        type='HybridEncoder',
        num_encoder_layers=1,
        in_channels=[128, 256, 512],
        use_encoder_idx=[2],
        expansion=0.5,
        layer_cfg=dict(
            self_attn_cfg=dict(embed_dims=256, num_heads=8,
                               dropout=0.0),  # 0.1 for DeformDETR
            ffn_cfg=dict(
                embed_dims=256,
                feedforward_channels=1024,  # 1024 for DeformDETR
                ffn_drop=0.0,
                act_cfg=dict(type='GELU'))),
        projector=dict(
            type='ChannelMapper',
            in_channels=[256, 256, 256],
            kernel_size=1,
            out_channels=256,
            act_cfg=None,
            norm_cfg=dict(type='BN'),
            num_outs=3)),  # 0.1 for DeformDETR
    encoder=None,
    decoder=dict(
        num_layers=3,
        eval_idx=-1,
        layer_cfg=dict(
            self_attn_cfg=dict(embed_dims=256, num_heads=8,
                               dropout=0.0),  # 0.1 for DeformDETR
            cross_attn_cfg=dict(
                embed_dims=256,
                num_levels=3,  # 4 for DeformDETR
                dropout=0.0),  # 0.1 for DeformDETR
            ffn_cfg=dict(
                embed_dims=256,
                feedforward_channels=1024,  # 2048 for DINO
                ffn_drop=0.0)),  # 0.1 for DeformDETR
        post_norm_cfg=None),
    positional_encoding=dict(
        num_feats=128,
        normalize=True,
        offset=0.0,  # -0.5 for DeformDETR
        temperature=20),  # 10000 for DeformDETR
    bbox_head=dict(
        type='RTDETRHead',
        num_classes=80,
        sync_cls_avg_factor=True,
        loss_cls=dict(
            type='VarifocalLossV1',
            use_sigmoid=True,
            use_rtdetr=True,
            gamma=2.0,
            alpha=0.75,  # 0.25 in DINO
            loss_weight=1.0),  # 2.0 in DeformDETR
        loss_bbox=dict(type='L1Loss', loss_weight=5.0),
        loss_iou=dict(type='GIoULoss', loss_weight=2.0)),
    dn_cfg=dict(
        label_noise_scale=0.5,
        box_noise_scale=1.0,  # 0.4 for DN-DETR
        group_cfg=dict(dynamic=True, num_groups=None, num_dn_queries=100)),
    # training and testing settings
    train_cfg=dict(
        assigner=dict(
            type='HungarianAssigner',
            match_costs=[
                dict(type='FocalLossCost', weight=2.0), # 一致
                dict(type='BBoxL1Cost', weight=5.0, box_format='xywh'),
                dict(type='IoUCost', iou_mode='giou', weight=2.0)
            ])),
    test_cfg=dict(max_per_img=300))  # 100 for DeformDETR

# train_pipeline, NOTE the img_scale and the Pad's size_divisor is different
# from the default setting in mmdet.
train_pipeline = [
    dict(type='LoadImageFromFile', backend_args=backend_args),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='PhotoMetricDistortion',
         hue_delta=13,
         prob=0.8),
    dict(
        type='Expand',
        # mean=[103.53, 116.28, 123.675],
        mean=[0, 0, 0],
        to_rgb=False,
        ratio_range=(1, 4),
        prob=0.5),
    # dict(
    #     type='RandomCropV1',
    #     crop_size=(0.3, 1.0),
    #     crop_type='relative_range',
    #     prob=0.8),
    dict(type='MinIoURandomCrop', aspect_ratio=[.5, 2.], prob=0.8),
    dict(type='RandomFlip', prob=0.5),
    dict(
        type='Resize',
        scale=eval_size,
        keep_ratio=False,
        interpolation='bilinear'), # bilinear bicubic
    # dict(
    #     type='RandomChoiceResize',
    #     resize_type='ResizeV1',
    #     scales=[(480, 480), (512, 512), (544, 544), (576, 576), (608, 608),
    #             (640, 640), (640, 640), (640, 640), (672, 672), (704, 704),
    #             (736, 736), (768, 768), (800, 800)],
    #     keep_ratio=False,
    #     random_interpolation=True),
    # dict(
    #     type='Pad',
    #     size=(800, 800)),
    dict(type='PackDetInputs')
]

test_pipeline = [
    dict(type='LoadImageFromFile', backend_args=backend_args),
    dict(
        type='Resize',
        scale=eval_size,
        keep_ratio=False,
        interpolation='bilinear'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(
        type='PackDetInputs',
        meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
                   'scale_factor'))
]

train_dataloader = dict(
    batch_size=4,
    num_workers=2,
    persistent_workers=True,
    sampler=dict(type='DefaultSampler', shuffle=True),
    batch_sampler=dict(type='AspectRatioBatchSampler'),
    dataset=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file='annotations/instances_train2017.json',
        data_prefix=dict(img='train2017/'),
        filter_cfg=dict(filter_empty_gt=False, min_size=32), pipeline=train_pipeline,
        backend_args=backend_args))

val_dataloader = dict(
    batch_size=16,
    num_workers=4,
    persistent_workers=True,
    drop_last=False,
    sampler=dict(type='DefaultSampler', shuffle=False),
    dataset=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file='annotations/instances_val2017.json',
        data_prefix=dict(img='val2017/'),
        test_mode=True,
        pipeline=test_pipeline,
        backend_args=backend_args))
test_dataloader = val_dataloader

val_evaluator = dict(
    type='CocoMetric',
    ann_file=data_root + 'annotations/instances_val2017.json',
    metric='bbox',
    format_only=False,
    backend_args=backend_args)
test_evaluator = val_evaluator

# optimizer
optim_wrapper = dict(
    type='OptimWrapper',
    optimizer=dict(
        type='AdamW',
        lr=0.0001,  # 0.0002 for DeformDETR
        weight_decay=0.0001),
    clip_grad=dict(max_norm=0.1, norm_type=2),
    paramwise_cfg=dict(
        bias_decay_mult=0.0,
        norm_decay_mult=0.0,
        custom_keys={
            '^(?=.*backbone)(?=.*norm).*$': dict(decay_mult=0.0, lr_mult=0.1),
            '^(?=.*backbone)(?!.*norm).*$': dict(lr_mult=0.1),
            }),
    constructor="RTDetrOptimizerConstructor",  # rewrite
    )  # custom_keys contains sampling_offsets and reference_points in DeformDETR  # noqa

# learning policy
max_epochs = 72
train_cfg = dict(
    type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=1)

val_cfg = dict(type='ValLoop')
test_cfg = dict(type='TestLoop')

param_scheduler = [
    # dict(
    #     type='LinearLR', start_factor=0.001, by_epoch=False, begin=0,
    #     end=2000),
    dict(
        type='MultiStepLR',
        begin=0,
        end=max_epochs,
        by_epoch=True,
        milestones=[1000],
        gamma=0.1)
]

# NOTE: `auto_scale_lr` is for automatically scaling LR,
# USER SHOULD NOT CHANGE ITS VALUES.
# base_batch_size = (8 GPUs) x (2 samples per GPU)
auto_scale_lr = dict(base_batch_size=16)

custom_hooks = [
    dict(
        type='EMAHook',
        ema_type='ExpMomentumEMA',
        momentum=0.0001,
        update_buffers=True,
        priority=49),
]

default_scope = 'mmdet'

default_hooks = dict(
    timer=dict(type='IterTimerHook'),
    logger=dict(type='LoggerHook', interval=50),
    param_scheduler=dict(type='ParamSchedulerHook'),
    checkpoint=dict(type='CheckpointHook', interval=1),
    sampler_seed=dict(type='DistSamplerSeedHook'),
    visualization=dict(type='DetVisualizationHook'))

env_cfg = dict(
    cudnn_benchmark=False,
    mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
    dist_cfg=dict(backend='nccl'),
)

vis_backends = [dict(type='LocalVisBackend')]
visualizer = dict(
    type='DetLocalVisualizer', vis_backends=vis_backends, name='visualizer')
log_processor = dict(type='LogProcessor', window_size=50, by_epoch=True)

@MODELS.register_module()
class BatchSyncRandomResizeV1(nn.Module):
    """Batch random resize which synchronizes the random size across ranks.

    Args:
        random_size_range (tuple): The multi-scale random range during
            multi-scale training.
        interval (int): The iter interval of change
            image size. Defaults to 10.
        size_divisor (int): Image size divisible factor.
            Defaults to 32.
    """

    def __init__(self,
                 multi_scales: Tuple =(
                     (480, 480), (512, 512), (544, 544),
                     (576, 576), (608, 608), (640, 640),
                     (640, 640), (640, 640), (672, 672),
                     (704, 704), (736, 736), (768, 768),
                     (800, 800)),
                 interval: int = 1) -> None:
        super().__init__()
        self.multi_scales = multi_scales
        self.rank, self.world_size = get_dist_info()
        self._input_size = None
        self._interval = interval

    def forward(
        self, inputs: Tensor, data_samples: List[DetDataSample]
    ) -> Tuple[Tensor, List[DetDataSample]]:
        """resize a batch of images and bboxes to shape ``self._input_size``"""
        h, w = inputs.shape[-2:]
        if self._input_size is None:
            self._input_size = (h, w)
        scale_y = self._input_size[0] / h
        scale_x = self._input_size[1] / w
        if scale_x != 1 or scale_y != 1:
            inputs = F.interpolate(
                inputs,
                size=self._input_size,
                # mode='bilinear',
                # align_corners=False
                )
            for data_sample in data_samples:
                img_shape = (int(data_sample.img_shape[0] * scale_y),
                             int(data_sample.img_shape[1] * scale_x))
                pad_shape = (int(data_sample.pad_shape[0] * scale_y),
                             int(data_sample.pad_shape[1] * scale_x))
                data_sample.set_metainfo({
                    'img_shape': img_shape,
                    'pad_shape': pad_shape,
                    'batch_input_shape': self._input_size
                })
                data_sample.gt_instances.bboxes[
                    ...,
                    0::2] = data_sample.gt_instances.bboxes[...,
                                                            0::2] * scale_x
                data_sample.gt_instances.bboxes[
                    ...,
                    1::2] = data_sample.gt_instances.bboxes[...,
                                                            1::2] * scale_y
                if 'ignored_instances' in data_sample:
                    data_sample.ignored_instances.bboxes[
                        ..., 0::2] = data_sample.ignored_instances.bboxes[
                            ..., 0::2] * scale_x
                    data_sample.ignored_instances.bboxes[
                        ..., 1::2] = data_sample.ignored_instances.bboxes[
                            ..., 1::2] * scale_y
        message_hub = MessageHub.get_current_instance()
        if (message_hub.get_info('iter') + 1) % self._interval == 0:
            self._input_size = self._get_random_size(device=inputs.device)
        return inputs, data_samples

    def _get_random_size(self, device: torch.device) -> Tuple[int, int]:
        """Randomly generate a shape in ``_random_size_range`` and broadcast to
        all ranks."""
        tensor = torch.LongTensor(2).to(device)
        if self.rank == 0:
            size = random.choice(self.multi_scales)
            tensor[0] = size[0]
            tensor[1] = size[0]
        barrier()
        broadcast(tensor, 0)
        input_size = (tensor[0].item(), tensor[1].item())
        return input_size

@OPTIM_WRAPPER_CONSTRUCTORS.register_module()
class RTDetrOptimizerConstructor(DefaultOptimWrapperConstructor):
    """RTDetr constructor for optimizers.

    It has the following functions:

        - divides the optimizer parameters into 3 groups:
        Conv, Bias and BN

        - support `weight_decay` parameter adaption based on
        `batch_size_per_gpu`

    Args:
        optim_wrapper_cfg (dict): The config dict of the optimizer wrapper.
            Positional fields are

                - ``type``: class name of the OptimizerWrapper
                - ``optimizer``: The configuration of optimizer.

            Optional fields are

                - any arguments of the corresponding optimizer wrapper type,
                  e.g., accumulative_counts, clip_grad, etc.

            The positional fields of ``optimizer`` are

                - `type`: class name of the optimizer.

            Optional fields are

                - any arguments of the corresponding optimizer type, e.g.,
                  lr, weight_decay, momentum, etc.

        paramwise_cfg (dict, optional): Parameter-wise options. Must include
            `base_total_batch_size` if not None. If the total input batch
            is smaller than `base_total_batch_size`, the `weight_decay`
            parameter will be kept unchanged, otherwise linear scaling.

    Example:
        >>> model = torch.nn.modules.Conv1d(1, 1, 1)
        >>> optim_wrapper_cfg = dict(
        >>>     dict(type='OptimWrapper', optimizer=dict(type='SGD', lr=0.01,
        >>>         momentum=0.9, weight_decay=0.0001, batch_size_per_gpu=16))
        >>> paramwise_cfg = dict(base_total_batch_size=64)
        >>> optim_wrapper_builder = YOLOv5OptimizerConstructor(
        >>>     optim_wrapper_cfg, paramwise_cfg)
        >>> optim_wrapper = optim_wrapper_builder(model)
    """

    def __init__(self, optim_wrapper_cfg: dict, paramwise_cfg: Optional[dict] = None):

        super(RTDetrOptimizerConstructor, self).__init__(optim_wrapper_cfg, paramwise_cfg)


    def add_params(self,
                   params: List[dict],
                   module: nn.Module,
                   prefix: str = '',
                   is_dcn_module: Optional[Union[int, float]] = None) -> None:
        """Add all parameters of module to the params list.

        The parameters of the given module will be added to the list of param
        groups, with specific rules defined by paramwise_cfg.

        Args:
            params (list[dict]): A list of param groups, it will be modified
                in place.
            module (nn.Module): The module to be added.
            prefix (str): The prefix of the module
            is_dcn_module (int|float|None): If the current module is a
                submodule of DCN, `is_dcn_module` will be passed to
                control conv_offset layer's learning rate. Defaults to None.
        """
        # get param-wise options
        custom_keys = self.paramwise_cfg.get('custom_keys', {})
        # first sort with alphabet order and then sort with reversed len of str
        sorted_keys = sorted(sorted(custom_keys.keys()), key=len, reverse=True)

        bias_lr_mult = self.paramwise_cfg.get('bias_lr_mult', None)
        bias_decay_mult = self.paramwise_cfg.get('bias_decay_mult', None)
        norm_decay_mult = self.paramwise_cfg.get('norm_decay_mult', None)
        dwconv_decay_mult = self.paramwise_cfg.get('dwconv_decay_mult', None)
        flat_decay_mult = self.paramwise_cfg.get('flat_decay_mult', None)
        bypass_duplicate = self.paramwise_cfg.get('bypass_duplicate', False)
        dcn_offset_lr_mult = self.paramwise_cfg.get('dcn_offset_lr_mult', None)

        # special rules for norm layers and depth-wise conv layers
        is_norm = isinstance(module,
                             (_BatchNorm, _InstanceNorm, GroupNorm, LayerNorm, nn.modules.batchnorm._NormBase))
        is_dwconv = (
            isinstance(module, torch.nn.Conv2d)
            and module.in_channels == module.groups)

        for name, param in module.named_parameters(recurse=False):
            full_nn = f"{prefix}.{name}"
            # print(full_nn)
            param_group = {'params': [param]}
            if bypass_duplicate and self._is_in(param_group, params):
                print_log(
                    f'{prefix} is duplicate. It is skipped since '
                    f'bypass_duplicate={bypass_duplicate}',
                    logger='current',
                    level=logging.WARNING)
                continue
            if not param.requires_grad:
                # print(full_nn, "grad false")
                params.append(param_group)
                continue

            # if the parameter match one of the custom keys, ignore other rules
            is_custom = False
            for key in sorted_keys:
                if param.requires_grad and len(re.findall(key, f"{prefix}.{name}")) > 0:
                # if key in f'{prefix}.{name}':
                    is_custom = True
                    lr_mult = custom_keys[key].get('lr_mult', 1.)
                    param_group['lr'] = self.base_lr * lr_mult
                    if self.base_wd is not None:
                        decay_mult = custom_keys[key].get('decay_mult', 1.)
                        param_group['weight_decay'] = self.base_wd * decay_mult
                    # add custom settings to param_group
                    for k, v in custom_keys[key].items():
                        param_group[k] = v
                    break

            if not is_custom:
                # bias_lr_mult affects all bias parameters
                # except for norm.bias dcn.conv_offset.bias
                if name == 'bias' and not (
                        is_norm or is_dcn_module) and bias_lr_mult is not None:
                    param_group['lr'] = self.base_lr * bias_lr_mult

                if (prefix.find('conv_offset') != -1 and is_dcn_module
                        and dcn_offset_lr_mult is not None
                        and isinstance(module, torch.nn.Conv2d)):
                    # deal with both dcn_offset's bias & weight
                    param_group['lr'] = self.base_lr * dcn_offset_lr_mult

                # apply weight decay policies
                if self.base_wd is not None:
                    # norm decay
                    if is_norm and norm_decay_mult is not None:
                        param_group[
                            'weight_decay'] = self.base_wd * norm_decay_mult
                    # bias lr and decay
                    elif (name == 'bias' and not is_dcn_module
                          and bias_decay_mult is not None):
                        param_group[
                            'weight_decay'] = self.base_wd * bias_decay_mult
                    # depth-wise conv
                    elif is_dwconv and dwconv_decay_mult is not None:
                        param_group[
                            'weight_decay'] = self.base_wd * dwconv_decay_mult
                    # flatten parameters except dcn offset
                    elif (param.ndim == 1 and not is_dcn_module
                          and flat_decay_mult is not None):
                        param_group[
                            'weight_decay'] = self.base_wd * flat_decay_mult
            params.append(param_group)
            # params_files = "./params.txt"
            # full_n = f'{prefix}.{name}' if prefix else name
            # infos = f"{full_n} lr:{param_group.get('lr', self.base_lr)} weight_decay: {param_group.get('weight_decay', self.base_wd)}\n"
            # if not os.path.isfile(params_files):
            #     with open(params_files, 'w') as fs:
            #         fs.write(infos)
            # else:
            #     with open(params_files, 'a') as fs:
            #         fs.write(infos)

            for key, value in param_group.items():
                if key == 'params':
                    continue
                full_name = f'{prefix}.{name}' if prefix else name
                print_log(
                    f'paramwise_options -- {full_name}:{key}={value}',
                    logger='current')
                # params_files = "./params.txt"
                # if not os.path.isfile(params_files):
                #     with open(params_files, 'w') as fs:
                #         fs.write(f"{full_name}:{key}={value}\n")
                # else:
                #     with open(params_files, 'a') as fs:
                #         fs.write(f"{full_name}:{key}={value}\n")


        if mmcv_full_available():
            from mmcv.ops import DeformConv2d, ModulatedDeformConv2d
            is_dcn_module = isinstance(module,
                                       (DeformConv2d, ModulatedDeformConv2d))
        else:
            is_dcn_module = False
        for child_name, child_mod in module.named_children():
            child_prefix = f'{prefix}.{child_name}' if prefix else child_name
            self.add_params(
                params,
                child_mod,
                prefix=child_prefix,
                is_dcn_module=is_dcn_module)

backbone using rtdetr official backbone because bn name in mmdet resnet deep_stem
@feival

@feivellau
Copy link

I have followed your advice for training, but have not made any changes to the source code.
data_preprocessor=dict(
type='DetDataPreprocessor',
mean=[0, 0, 0],
std=[255, 255, 255],
bgr_to_rgb=True,
pad_size_divisor=32,
batch_augments=[dict(type='BatchSyncRandomResize', random_size_range=(480, 800))]),

paramwise_cfg=dict(custom_keys={'backbone': dict(lr_mult=0.1)},
norm_decay_mult=0,
bias_decay_mult=0)

  • r18 ,1xa40 batch_size 32,syncbn,64 epoch result
    Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.466
    Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.637
    Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.504
    Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.287
    Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.502
    Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.624
    Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.691
    Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.694
    Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.694
    Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.519
    Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.734
    Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.860
    10/17 08:43:55 - mmengine - INFO - bbox_mAP_copypaste: 0.466 0.637 0.504 0.287 0.502 0.624
    10/17 08:43:57 - mmengine - INFO - Epoch(val) [64][157/157] coco/bbox_mAP: 0.4660 coco/bbox_mAP_50: 0.6370 coco/bbox_mAP_75: 0.5040 coco/bbox_mAP_s: 0.2870 coco/bbox_mAP_m: 0.5020 coco/bbox_mAP_l: 0.6240 data_time: 0.0307 time: 0.3151

@jiesonshan
Copy link

I have followed your advice for training, but have not made any changes to the source code. data_preprocessor=dict( type='DetDataPreprocessor', mean=[0, 0, 0], std=[255, 255, 255], bgr_to_rgb=True, pad_size_divisor=32, batch_augments=[dict(type='BatchSyncRandomResize', random_size_range=(480, 800))]),

paramwise_cfg=dict(custom_keys={'backbone': dict(lr_mult=0.1)}, norm_decay_mult=0, bias_decay_mult=0)

  • r18 ,1xa40 batch_size 32,syncbn,64 epoch result
    Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.466
    Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.637
    Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.504
    Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.287
    Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.502
    Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.624
    Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.691
    Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.694
    Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.694
    Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.519
    Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.734
    Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.860
    10/17 08:43:55 - mmengine - INFO - bbox_mAP_copypaste: 0.466 0.637 0.504 0.287 0.502 0.624
    10/17 08:43:57 - mmengine - INFO - Epoch(val) [64][157/157] coco/bbox_mAP: 0.4660 coco/bbox_mAP_50: 0.6370 coco/bbox_mAP_75: 0.5040 coco/bbox_mAP_s: 0.2870 coco/bbox_mAP_m: 0.5020 coco/bbox_mAP_l: 0.6240 data_time: 0.0307 time: 0.3151

can you provide version of source code, Thank you. I can't get same results by batch size 16

0, W - 1, W, dtype=torch.float32, device=device))
grid = torch.cat([grid_x.unsqueeze(-1), grid_y.unsqueeze(-1)], -1)

valid_wh = torch.tensor([H, W], dtype=torch.float32, device=device)
Copy link
Contributor

@rydenisbak rydenisbak Oct 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to change this line to

valid_wh = torch.tensor([W, H], dtype=torch.float32, device=device)

otherwise I got problem for non square image.
This bug also is in original code, so I made issue about it
PaddlePaddle/PaddleDetection#8680

@feivellau
Copy link

I have followed your advice for training, but have not made any changes to the source code. data_preprocessor=dict( type='DetDataPreprocessor', mean=[0, 0, 0], std=[255, 255, 255], bgr_to_rgb=True, pad_size_divisor=32, batch_augments=[dict(type='BatchSyncRandomResize', random_size_range=(480, 800))]),
paramwise_cfg=dict(custom_keys={'backbone': dict(lr_mult=0.1)}, norm_decay_mult=0, bias_decay_mult=0)

  • r18 ,1xa40 batch_size 32,syncbn,64 epoch result
    Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.466
    Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.637
    Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.504
    Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.287
    Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.502
    Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.624
    Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.691
    Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.694
    Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.694
    Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.519
    Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.734
    Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.860
    10/17 08:43:55 - mmengine - INFO - bbox_mAP_copypaste: 0.466 0.637 0.504 0.287 0.502 0.624
    10/17 08:43:57 - mmengine - INFO - Epoch(val) [64][157/157] coco/bbox_mAP: 0.4660 coco/bbox_mAP_50: 0.6370 coco/bbox_mAP_75: 0.5040 coco/bbox_mAP_s: 0.2870 coco/bbox_mAP_m: 0.5020 coco/bbox_mAP_l: 0.6240 data_time: 0.0307 time: 0.3151

can you provide version of source code, Thank you. I can't get same results by batch size 16

My code and weights is based https://github.com/nijkah/mmdetection/blob/rtdetr/ and #10498 (comment).
And then config is shown this:

_base_ = ['../_base_/datasets/coco_detection.py', '../_base_/default_runtime.py']

max_epochs = 72
train_batch_size_per_gpu = 32
train_num_workers = 8
persistent_workers = True
eval_size = (640, 640)
norm_cfg = dict(type='SyncBN', requires_grad=True)
pretrained = './pretrained/resnet18vd_pretrained.pth'  # noqa

model = dict(
    type='RTDETR',
    num_queries=300,  # num_matching_queries
    with_box_refine=True,
    as_two_stage=True,
    eval_size=eval_size,
    data_preprocessor=dict(
        type='DetDataPreprocessor',
        mean=[0, 0, 0],
        std=[255, 255, 255],
        bgr_to_rgb=True,
        pad_size_divisor=32,
        batch_augments=[dict(type='BatchSyncRandomResize', random_size_range=(480, 800))]),
    backbone=dict(type='ResNetV1d',
                  depth=18,
                  num_stages=4,
                  out_indices=(1, 2, 3),
                  frozen_stages=-1,
                  norm_cfg=norm_cfg,
                  norm_eval=False,
                  style='pytorch',
                  init_cfg=dict(type='Pretrained', checkpoint=pretrained)),
    neck=dict(
        type='HybridEncoder',
        num_encoder_layers=1,
        in_channels=[128, 256, 512],
        use_encoder_idx=[2],
        expansion=0.5,
        norm_cfg=norm_cfg,
        layer_cfg=dict(
            self_attn_cfg=dict(embed_dims=256, num_heads=8, dropout=0.0),  # 0.1 for DeformDETR
            ffn_cfg=dict(
                embed_dims=256,
                feedforward_channels=1024,  # 1024 for DeformDETR
                ffn_drop=0.0,
                act_cfg=dict(type='GELU'))),
        projector=dict(type='ChannelMapper',
                       in_channels=[256, 256, 256],
                       kernel_size=1,
                       out_channels=256,
                       act_cfg=None,
                       norm_cfg=norm_cfg,
                       num_outs=3)),  # 0.1 for DeformDETR
    encoder=None,
    decoder=dict(
        num_layers=3,
        eval_idx=-1,
        layer_cfg=dict(
            self_attn_cfg=dict(embed_dims=256, num_heads=8, dropout=0.0),  # 0.1 for DeformDETR
            cross_attn_cfg=dict(
                embed_dims=256,
                num_levels=3,  # 4 for DeformDETR
                dropout=0.0),  # 0.1 for DeformDETR
            ffn_cfg=dict(
                embed_dims=256,
                feedforward_channels=1024,  # 2048 for DINO
                ffn_drop=0.0)),  # 0.1 for DeformDETR
        post_norm_cfg=None),
    positional_encoding=dict(
        num_feats=128,
        normalize=True,
        offset=0.0,  # -0.5 for DeformDETR
        temperature=20),  # 10000 for DeformDETR
    bbox_head=dict(
        type='RTDETRHead',
        num_classes=80,
        sync_cls_avg_factor=True,
        loss_cls=dict(
            type='VarifocalLoss',
            use_sigmoid=True,
            use_rtdetr=True,
            gamma=2.0,
            alpha=0.75,  # 0.25 in DINO
            loss_weight=1.0),  # 2.0 in DeformDETR
        loss_bbox=dict(type='L1Loss', loss_weight=5.0),
        loss_iou=dict(type='GIoULoss', loss_weight=2.0)),
    dn_cfg=dict(
        label_noise_scale=0.5,
        box_noise_scale=1.0,  # 0.4 for DN-DETR
        group_cfg=dict(dynamic=True, num_groups=None, num_dn_queries=100)),
    # training and testing settings
    train_cfg=dict(assigner=dict(type='HungarianAssigner',
                                 match_costs=[
                                     dict(type='FocalLossCost', weight=2.0),
                                     dict(type='BBoxL1Cost', weight=5.0, box_format='xywh'),
                                     dict(type='IoUCost', iou_mode='giou', weight=2.0)
                                 ])),
    test_cfg=dict(max_per_img=300))  # 100 for DeformDETR

# train_pipeline, NOTE the img_scale and the Pad's size_divisor is different
# from the default setting in mmdet.
train_pipeline = [
    dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='PhotoMetricDistortion', prob=0.8),
    # dict(type='Expand', mean=[103.53, 116.28, 123.675], to_rgb=True, ratio_range=(1, 4), prob=0.5),
    dict(type='Expand', mean=[0, 0, 0], to_rgb=True, ratio_range=(1, 4), prob=0.5),
    # dict(type='RandomCrop', crop_size=(0.3, 1.0), crop_type='relative_range', prob=0.8),
    dict(type='MinIoURandomCrop', aspect_ratio=[.5, 2.], prob=0.8),
    dict(type='RandomFlip', prob=0.5),
    dict(type='Resize', scale=eval_size, keep_ratio=False, interpolation='bicubic'),
    dict(type='FilterAnnotations', min_gt_bbox_wh=(1, 1), keep_empty=False),
    dict(type='PackDetInputs')
]

test_pipeline = [
    dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}),
    dict(type='Resize', scale=eval_size, keep_ratio=False, interpolation='bicubic'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='PackDetInputs',
         meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor'))
]

train_dataloader = dict(batch_size=train_batch_size_per_gpu,
                        num_workers=train_num_workers,
                        persistent_workers=persistent_workers,
                        dataset=dict(filter_cfg=dict(filter_empty_gt=False),
                                     pipeline=train_pipeline))
val_dataloader = dict(batch_size=train_batch_size_per_gpu,
                      num_workers=train_num_workers,
                      persistent_workers=persistent_workers,
                      dataset=dict(pipeline=test_pipeline))
test_dataloader = val_dataloader

# optimizer
optim_wrapper = dict(
    type='OptimWrapper',
    optimizer=dict(
        type='AdamW',
        lr=0.0001,  # 0.0002 for DeformDETR
        weight_decay=0.0001),
    clip_grad=dict(max_norm=0.1, norm_type=2),
    paramwise_cfg=dict(custom_keys={'backbone': dict(lr_mult=0.1)},
                       norm_decay_mult=0,
                       bias_decay_mult=0)
)  # custom_keys contains sampling_offsets and reference_points in DeformDETR  # noqa

# learning policy

train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=1)

val_cfg = dict(type='ValLoop')
test_cfg = dict(type='TestLoop')

param_scheduler = [
    dict(type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=2000),
    dict(type='MultiStepLR', begin=0, end=max_epochs, by_epoch=True, milestones=[100], gamma=1.0)
]

# NOTE: `auto_scale_lr` is for automatically scaling LR,
# USER SHOULD NOT CHANGE ITS VALUES.
# base_batch_size = (8 GPUs) x (2 samples per GPU)
auto_scale_lr = dict(enable=True, base_batch_size=16)

custom_hooks = [
    dict(type='EMAHook',
         ema_type='ExpMomentumEMA',
         momentum=0.0001,
         update_buffers=True,
         priority=49),
]

@feivellau
Copy link

I have followed your advice for training, but have not made any changes to the source code. data_preprocessor=dict( type='DetDataPreprocessor', mean=[0, 0, 0], std=[255, 255, 255], bgr_to_rgb=True, pad_size_divisor=32, batch_augments=[dict(type='BatchSyncRandomResize', random_size_range=(480, 800))]),
paramwise_cfg=dict(custom_keys={'backbone': dict(lr_mult=0.1)}, norm_decay_mult=0, bias_decay_mult=0)

  • r18 ,1xa40 batch_size 32,syncbn,64 epoch result
    Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.466
    Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.637
    Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.504
    Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.287
    Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.502
    Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.624
    Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.691
    Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.694
    Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.694
    Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.519
    Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.734
    Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.860
    10/17 08:43:55 - mmengine - INFO - bbox_mAP_copypaste: 0.466 0.637 0.504 0.287 0.502 0.624
    10/17 08:43:57 - mmengine - INFO - Epoch(val) [64][157/157] coco/bbox_mAP: 0.4660 coco/bbox_mAP_50: 0.6370 coco/bbox_mAP_75: 0.5040 coco/bbox_mAP_s: 0.2870 coco/bbox_mAP_m: 0.5020 coco/bbox_mAP_l: 0.6240 data_time: 0.0307 time: 0.3151

can you provide version of source code, Thank you. I can't get same results by batch size 16

My code and weights is based https://github.com/nijkah/mmdetection/blob/rtdetr/ and #10498 (comment). And then config is shown this:

_base_ = ['../_base_/datasets/coco_detection.py', '../_base_/default_runtime.py']

max_epochs = 72
train_batch_size_per_gpu = 32
train_num_workers = 8
persistent_workers = True
eval_size = (640, 640)
norm_cfg = dict(type='SyncBN', requires_grad=True)
pretrained = './pretrained/resnet18vd_pretrained.pth'  # noqa

model = dict(
    type='RTDETR',
    num_queries=300,  # num_matching_queries
    with_box_refine=True,
    as_two_stage=True,
    eval_size=eval_size,
    data_preprocessor=dict(
        type='DetDataPreprocessor',
        mean=[0, 0, 0],
        std=[255, 255, 255],
        bgr_to_rgb=True,
        pad_size_divisor=32,
        batch_augments=[dict(type='BatchSyncRandomResize', random_size_range=(480, 800))]),
    backbone=dict(type='ResNetV1d',
                  depth=18,
                  num_stages=4,
                  out_indices=(1, 2, 3),
                  frozen_stages=-1,
                  norm_cfg=norm_cfg,
                  norm_eval=False,
                  style='pytorch',
                  init_cfg=dict(type='Pretrained', checkpoint=pretrained)),
    neck=dict(
        type='HybridEncoder',
        num_encoder_layers=1,
        in_channels=[128, 256, 512],
        use_encoder_idx=[2],
        expansion=0.5,
        norm_cfg=norm_cfg,
        layer_cfg=dict(
            self_attn_cfg=dict(embed_dims=256, num_heads=8, dropout=0.0),  # 0.1 for DeformDETR
            ffn_cfg=dict(
                embed_dims=256,
                feedforward_channels=1024,  # 1024 for DeformDETR
                ffn_drop=0.0,
                act_cfg=dict(type='GELU'))),
        projector=dict(type='ChannelMapper',
                       in_channels=[256, 256, 256],
                       kernel_size=1,
                       out_channels=256,
                       act_cfg=None,
                       norm_cfg=norm_cfg,
                       num_outs=3)),  # 0.1 for DeformDETR
    encoder=None,
    decoder=dict(
        num_layers=3,
        eval_idx=-1,
        layer_cfg=dict(
            self_attn_cfg=dict(embed_dims=256, num_heads=8, dropout=0.0),  # 0.1 for DeformDETR
            cross_attn_cfg=dict(
                embed_dims=256,
                num_levels=3,  # 4 for DeformDETR
                dropout=0.0),  # 0.1 for DeformDETR
            ffn_cfg=dict(
                embed_dims=256,
                feedforward_channels=1024,  # 2048 for DINO
                ffn_drop=0.0)),  # 0.1 for DeformDETR
        post_norm_cfg=None),
    positional_encoding=dict(
        num_feats=128,
        normalize=True,
        offset=0.0,  # -0.5 for DeformDETR
        temperature=20),  # 10000 for DeformDETR
    bbox_head=dict(
        type='RTDETRHead',
        num_classes=80,
        sync_cls_avg_factor=True,
        loss_cls=dict(
            type='VarifocalLoss',
            use_sigmoid=True,
            use_rtdetr=True,
            gamma=2.0,
            alpha=0.75,  # 0.25 in DINO
            loss_weight=1.0),  # 2.0 in DeformDETR
        loss_bbox=dict(type='L1Loss', loss_weight=5.0),
        loss_iou=dict(type='GIoULoss', loss_weight=2.0)),
    dn_cfg=dict(
        label_noise_scale=0.5,
        box_noise_scale=1.0,  # 0.4 for DN-DETR
        group_cfg=dict(dynamic=True, num_groups=None, num_dn_queries=100)),
    # training and testing settings
    train_cfg=dict(assigner=dict(type='HungarianAssigner',
                                 match_costs=[
                                     dict(type='FocalLossCost', weight=2.0),
                                     dict(type='BBoxL1Cost', weight=5.0, box_format='xywh'),
                                     dict(type='IoUCost', iou_mode='giou', weight=2.0)
                                 ])),
    test_cfg=dict(max_per_img=300))  # 100 for DeformDETR

# train_pipeline, NOTE the img_scale and the Pad's size_divisor is different
# from the default setting in mmdet.
train_pipeline = [
    dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='PhotoMetricDistortion', prob=0.8),
    # dict(type='Expand', mean=[103.53, 116.28, 123.675], to_rgb=True, ratio_range=(1, 4), prob=0.5),
    dict(type='Expand', mean=[0, 0, 0], to_rgb=True, ratio_range=(1, 4), prob=0.5),
    # dict(type='RandomCrop', crop_size=(0.3, 1.0), crop_type='relative_range', prob=0.8),
    dict(type='MinIoURandomCrop', aspect_ratio=[.5, 2.], prob=0.8),
    dict(type='RandomFlip', prob=0.5),
    dict(type='Resize', scale=eval_size, keep_ratio=False, interpolation='bicubic'),
    dict(type='FilterAnnotations', min_gt_bbox_wh=(1, 1), keep_empty=False),
    dict(type='PackDetInputs')
]

test_pipeline = [
    dict(type='LoadImageFromFile', backend_args={{_base_.backend_args}}),
    dict(type='Resize', scale=eval_size, keep_ratio=False, interpolation='bicubic'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='PackDetInputs',
         meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor'))
]

train_dataloader = dict(batch_size=train_batch_size_per_gpu,
                        num_workers=train_num_workers,
                        persistent_workers=persistent_workers,
                        dataset=dict(filter_cfg=dict(filter_empty_gt=False),
                                     pipeline=train_pipeline))
val_dataloader = dict(batch_size=train_batch_size_per_gpu,
                      num_workers=train_num_workers,
                      persistent_workers=persistent_workers,
                      dataset=dict(pipeline=test_pipeline))
test_dataloader = val_dataloader

# optimizer
optim_wrapper = dict(
    type='OptimWrapper',
    optimizer=dict(
        type='AdamW',
        lr=0.0001,  # 0.0002 for DeformDETR
        weight_decay=0.0001),
    clip_grad=dict(max_norm=0.1, norm_type=2),
    paramwise_cfg=dict(custom_keys={'backbone': dict(lr_mult=0.1)},
                       norm_decay_mult=0,
                       bias_decay_mult=0)
)  # custom_keys contains sampling_offsets and reference_points in DeformDETR  # noqa

# learning policy

train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=max_epochs, val_interval=1)

val_cfg = dict(type='ValLoop')
test_cfg = dict(type='TestLoop')

param_scheduler = [
    dict(type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=2000),
    dict(type='MultiStepLR', begin=0, end=max_epochs, by_epoch=True, milestones=[100], gamma=1.0)
]

# NOTE: `auto_scale_lr` is for automatically scaling LR,
# USER SHOULD NOT CHANGE ITS VALUES.
# base_batch_size = (8 GPUs) x (2 samples per GPU)
auto_scale_lr = dict(enable=True, base_batch_size=16)

custom_hooks = [
    dict(type='EMAHook',
         ema_type='ExpMomentumEMA',
         momentum=0.0001,
         update_buffers=True,
         priority=49),
]

@jiesonshan @hhaAndroid @nijkah
Base this, I trained on 4 gpu (A40) with syncBN:

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.470
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.642
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.510
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.298
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.505
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.622
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.690
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.693
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.693
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.505
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.731
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.859
10/29 07:11:54 - mmengine - INFO - bbox_mAP_copypaste: 0.470 0.642 0.510 0.298 0.505 0.622
10/29 07:11:58 - mmengine - INFO - Epoch(val) [72][40/40]    coco/bbox_mAP: 0.4700  coco/bbox_mAP_50: 0.6420  coco/bbox_mAP_75: 0.5100  coco/bbox_mAP_s: 0.2980  coco/bbox_mAP_m: 0.5050  coco/bbox_mAP_l: 0.6220  data_time: 0.0351  time: 0.2554
10/29 07:11:58 - mmengine - INFO - The previous best checkpoint /dahuafs/userdata/229288/00_deeplearning/08_meta/02_OD/mmdetection/runs/rtdetr/debug_1026/best_coco_bbox_mAP_epoch_69.pth is removed
10/29 07:12:02 - mmengine - INFO - The best checkpoint with 0.4700 coco/bbox_mAP at 72 epoch is saved to best_coco_bbox_mAP_epoch_72.pth.

Expand with dict(type='Expand', mean=[0, 0, 0], to_rgb=True, ratio_range=(1, 4), prob=0.5) or dict(type='Expand', mean=[103.53, 116.28, 123.675], to_rgb=True, ratio_range=(1, 4), prob=0.5) the result has no difference.

@flytocc
Copy link

flytocc commented Nov 4, 2023

I plan to add support for RT-DETR to MMDetection and have already completed the code for the r18vd arch model in rtdetr.

I didn't notice this existing PR during my coding, and there are some differences between my implementation and this PR. I haven't encountered this situation before.

@hhaAndroid What should I do next? Merge my work into this PR? Or submit a new PR?

@HaoLiuHust
Copy link

any plan to merge this? is this pr reproduce the result?

@twmht
Copy link
Contributor

twmht commented Dec 19, 2023

Any update on this?

@flytocc
Copy link

flytocc commented Jan 5, 2024

I plan to add support for RT-DETR to MMDetection and have already completed the code for the r18vd arch model in rtdetr.

I didn't notice this existing PR during my coding, and there are some differences between my implementation and this PR. I haven't encountered this situation before.

@hhaAndroid What should I do next? Merge my work into this PR? Or submit a new PR?

I trained rtdetr_r18vd on 1 gpu (V100) with total batch size 16, 46.5(w. amp) and 46.6(w.o. amp):

with amp

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.465
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.639
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.503
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.286
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.501
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.625
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.689
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.692
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.692
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.506
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.733
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.872

without amp

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.466
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.640
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.505
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.289
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.498
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.629
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.689
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.692
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.692
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.496
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.733
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.864

@flytocc
Copy link

flytocc commented Jan 12, 2024

I plan to add support for RT-DETR to MMDetection and have already completed the code for the r18vd arch model in rtdetr.
I didn't notice this existing PR during my coding, and there are some differences between my implementation and this PR. I haven't encountered this situation before.
@hhaAndroid What should I do next? Merge my work into this PR? Or submit a new PR?

I trained rtdetr_r18vd on 1 gpu (V100) with total batch size 16, 46.5(w. amp) and 46.6(w.o. amp):

with amp

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.465
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.639
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.503
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.286
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.501
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.625
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.689
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.692
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.692
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.506
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.733
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.872

without amp

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.466
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.640
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.505
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.289
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.498
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.629
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.689
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.692
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.692
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.496
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.733
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.864

I trained rtdetr_r50vd on 1 gpu (V100) with total batch size 16, 52.7 w. amp (53.1 in paper):

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.527
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.710
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.567
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.341
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.571
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.701
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.721
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.723
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.723
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.548
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.766
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.880

@flytocc
Copy link

flytocc commented Jan 15, 2024

After fixing the initialization method, I got 53.1 on the rtdetr_r50vd with amp training.

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.531
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.714
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.575
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.351
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.578
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.700
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.722
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.724
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.724
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.549
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.766
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.883

@hhaAndroid I think the reproduction of rtdetr is done (46.6 for r18vd, 53.1 for r50vd). Does mmdetection plan to support rtdetr? I would be willing to propose a PR.

@twmht
Copy link
Contributor

twmht commented Jan 17, 2024

@flytocc

Can you help to propose a PR?

@flytocc flytocc mentioned this pull request Jan 17, 2024
4 tasks
@flytocc
Copy link

flytocc commented Jan 17, 2024

@flytocc

Can you help to propose a PR?

@twmht #11395

@ocrhei
Copy link

ocrhei commented Feb 1, 2024

非常期待rt-detr上线

@sounakdey
Copy link

sounakdey commented Feb 14, 2024

Really looking forward for the release of RT-DETR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Any plan support RT-DETR?