Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Four 3090 cannot get the the authors' results,that why? #48

Closed
Rzx520 opened this issue Nov 20, 2023 · 21 comments
Closed

Four 3090 cannot get the the authors' results,that why? #48

Rzx520 opened this issue Nov 20, 2023 · 21 comments
Assignees

Comments

@Rzx520
Copy link

Rzx520 commented Nov 20, 2023

          > As you can see, I got the same results as the @orrzohar show in the paper. I wonder how many cards you used with batch_size = 2. I think if you use a single card, the result may be worse than I got (I used four cards with batch_size = 3) @Rzx520 . By the way, what are your final results? Are they far from the authors' results?

I used four cards with batch_size = 3,the result is :

{"train_lr": 1.999999999999943e-05, "train_class_error": 15.52755644357749, "train_grad_norm": 119.24543388206256, "train_loss": 5.189852057201781, "train_loss_bbox": 0.2700958194790585, "train_loss_bbox_0": 0.29624945830832017, "train_loss_bbox_1": 0.27978440371434526, "train_loss_bbox_2": 0.275065722955665, "train_loss_bbox_3": 0.27241891570675625, "train_loss_bbox_4": 0.27063051075218725, "train_loss_ce": 0.18834440561282928, "train_loss_ce_0": 0.27234036786085974, "train_loss_ce_1": 0.23321395799885028, "train_loss_ce_2": 0.20806531186409408, "train_loss_ce_3": 0.19453731594314128, "train_loss_ce_4": 0.18820172232765492, "train_loss_giou": 0.3351372324140976, "train_loss_giou_0": 0.3679243937037491, "train_loss_giou_1": 0.3483400315024699, "train_loss_giou_2": 0.34171414935044225, "train_loss_giou_3": 0.3379105142249501, "train_loss_giou_4": 0.3368650070453053, "train_loss_obj_ll": 0.02471167313379382, "train_loss_obj_ll_0": 0.034151954339996814, "train_loss_obj_ll_1": 0.03029250531194649, "train_loss_obj_ll_2": 0.0288731191750343, "train_loss_obj_ll_3": 0.028083207809715446, "train_loss_obj_ll_4": 0.026900355121292352, "train_cardinality_error_unscaled": 0.44506890101437985, "train_cardinality_error_0_unscaled": 0.6769398279525907, "train_cardinality_error_1_unscaled": 0.5726976196583499, "train_cardinality_error_2_unscaled": 0.4929900999093851, "train_cardinality_error_3_unscaled": 0.46150593285633223, "train_cardinality_error_4_unscaled": 0.45256225438417086, "train_class_error_unscaled": 15.52755644357749, "train_loss_bbox_unscaled": 0.054019163965779084, "train_loss_bbox_0_unscaled": 0.059249891647616536, "train_loss_bbox_1_unscaled": 0.055956880831476395, "train_loss_bbox_2_unscaled": 0.055013144572493046, "train_loss_bbox_3_unscaled": 0.054483783067331704, "train_loss_bbox_4_unscaled": 0.05412610215448962, "train_loss_ce_unscaled": 0.09417220280641464, "train_loss_ce_0_unscaled": 0.13617018393042987, "train_loss_ce_1_unscaled": 0.11660697899942514, "train_loss_ce_2_unscaled": 0.10403265593204704, "train_loss_ce_3_unscaled": 0.09726865797157064, "train_loss_ce_4_unscaled": 0.09410086116382746, "train_loss_giou_unscaled": 0.1675686162070488, "train_loss_giou_0_unscaled": 0.18396219685187454, "train_loss_giou_1_unscaled": 0.17417001575123495, "train_loss_giou_2_unscaled": 0.17085707467522113, "train_loss_giou_3_unscaled": 0.16895525711247505, "train_loss_giou_4_unscaled": 0.16843250352265265, "train_loss_obj_ll_unscaled": 30.889592197686543, "train_loss_obj_ll_0_unscaled": 42.68994404527915, "train_loss_obj_ll_1_unscaled": 37.86563257517548, "train_loss_obj_ll_2_unscaled": 36.09139981038161, "train_loss_obj_ll_3_unscaled": 35.10401065181873, "train_loss_obj_ll_4_unscaled": 33.62544476769816, "test_metrics": {"WI": 0.05356004827184098, "AOSA": 5220.0, "CK_AP50": 58.3890380859375, "CK_P50": 25.75118307055908, "CK_R50": 71.51227713815234, "K_AP50": 58.3890380859375, "K_P50": 25.75118307055908, "K_R50": 71.51227713815234, "U_AP50": 2.7862398624420166, "U_P50": 0.409358215516747, "U_R50": 16.530874785591767}, "test_coco_eval_bbox": [14.451444625854492, 14.451444625854492, 77.8148193359375, 57.15019607543945, 66.93928527832031, 49.282108306884766, 27.985671997070312, 70.54130554199219, 55.28901290893555, 82.7206039428711, 26.307403564453125, 65.15182495117188, 21.9127197265625, 77.91541290283203, 73.61457061767578, 67.8846206665039, 49.1287841796875, 36.78118896484375, 69.1879653930664, 53.060150146484375, 79.1402359008789, 59.972835540771484, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.7862398624420166], "epoch": 40, "n_parameters": 39742295}

the authors' results is :
U-R:19.4,K-AP:59.5
Why is it that the author's performance cannot be achieved?
@Hatins @orrzohar

Originally posted by @Rzx520 in #26 (comment)

@Rzx520 Rzx520 changed the title > As you can see, I got the same results as the @orrzohar show in the paper. I wonder how many cards you used with batch_size = 2. I think if you use a single card, the result may be worse than I got (I used four cards with batch_size = 3) @Rzx520 . By the way, what are your final results? Are they far from the authors' results? Four 3090 cannot get the the authors' results,that why? Nov 20, 2023
@orrzohar
Copy link
Owner

Hi @Rzx520,
When you change optimization hyperparameters, the results will inevitably change. That is true for PROB and nearly all deep learning models.

Luckily, PROB is relatively robust and requires minimal hyperparameter tuning to match our performance, at least on all the systems I have encountered. Specifically with Titan RTX 3090, our results were already reproduced (see Issue #26). On a system of 3090x4, lr_drop needed to be increased to 40 to match our reported results. If you have a different number of GPUs, there may be a better one for your system.

I am happy to help with this process, but to do so, I need to see your training curves.

Best,
Orr

@orrzohar orrzohar self-assigned this Nov 20, 2023
@Rzx520
Copy link
Author

Rzx520 commented Nov 21, 2023

The above result is the result of adjusting lr_drop to 40, so I am quite confused.

@Rzx520
Copy link
Author

Rzx520 commented Nov 21, 2023

#26 (comment)

@orrzohar
Copy link
Owner

Did you use the same number of GPUs is in #26 ?
If not, then if you share your training curves I could try and help you with hyperparameter optimization.

@Rzx520
Copy link
Author

Rzx520 commented Nov 22, 2023

Yes, I also used 4GPUs. Thank you very much. Since I turned off Wandb, I had to retrain to obtain the training curves.This may take a while as the server is being used.

@Rzx520
Copy link
Author

Rzx520 commented Nov 23, 2023

image
image
image

Above is the result of this parameter setting training. @orrzohar

################ Deformable DETR ################
parser.add_argument('--lr', default=2e-4, type=float)
parser.add_argument('--lr_backbone_names', default=["backbone.0"], type=str, nargs='+')
parser.add_argument('--lr_backbone', default=2e-5, type=float)
parser.add_argument('--lr_linear_proj_names', default=['reference_points', 'sampling_offsets'], type=str, nargs='+')
parser.add_argument('--lr_linear_proj_mult', default=0.1, type=float)
#parser.add_argument('--batch_size', default=5, type=int)
#parser.add_argument('--batch_size', default=3, type=int)
parser.add_argument('--batch_size', default=2, type=int)
parser.add_argument('--weight_decay', default=1e-4, type=float)
parser.add_argument('--epochs', default=51, type=int)
#parser.add_argument('--lr_drop', default=35, type=int)
parser.add_argument('--lr_drop', default=40, type=int)

parser.add_argument('--lr_drop_epochs', default=None, type=int, nargs='+')
parser.add_argument('--clip_max_norm', default=0.1, type=float,
                    help='gradient clipping max norm')
parser.add_argument('--sgd', action='store_true')

@orrzohar
Copy link
Owner

Hi @Rzx520,
You are overtraining the model, should reduce the lr_drop to 150k iterations (lr_drop30).
I am concerned that you are using the same system as in #26 but getting different optimization results, I wonder how the two systems differ.
Best,
Orr

@Rzx520
Copy link
Author

Rzx520 commented Nov 25, 2023

I am trying lr_drop=30, I will send it back when the training results are available.And I also wonder how the two systems differ,so I asked some questions in #26 (comment)

@orrzohar
Copy link
Owner

Hi @Rzx520,
I see, I do not know Hatins, so I have no way of facilitating communication.
I am very surprised that you both used 4x3090s, but each needed different results.

@Rzx520 Rzx520 closed this as not planned Won't fix, can't repro, duplicate, stale Nov 26, 2023
@Rzx520
Copy link
Author

Rzx520 commented Nov 26, 2023

image
image

Above is the result of this parameter setting training, lr_drop = 30. @orrzohar

################ Deformable DETR ################
parser.add_argument('--lr', default=2e-4, type=float)
parser.add_argument('--lr_backbone_names', default=["backbone.0"], type=str, nargs='+')
parser.add_argument('--lr_backbone', default=2e-5, type=float)
parser.add_argument('--lr_linear_proj_names', default=['reference_points', 'sampling_offsets'], type=str, nargs='+')
parser.add_argument('--lr_linear_proj_mult', default=0.1, type=float)
#parser.add_argument('--batch_size', default=5, type=int)
#parser.add_argument('--batch_size', default=3, type=int)
parser.add_argument('--batch_size', default=2, type=int)
parser.add_argument('--weight_decay', default=1e-4, type=float)
parser.add_argument('--epochs', default=51, type=int)
#parser.add_argument('--lr_drop', default=35, type=int)
parser.add_argument('--lr_drop', default=40, type=int)

parser.add_argument('--lr_drop_epochs', default=None, type=int, nargs='+')
parser.add_argument('--clip_max_norm', default=0.1, type=float,
help='gradient clipping max norm')
parser.add_argument('--sgd', action='store_true')

@orrzohar orrzohar reopened this Nov 26, 2023
@orrzohar
Copy link
Owner

Hi @Rzx520,
I noticed that you used batch_size=2 not batch_size=3 like in Hatins in #26.
Why is that the case? That could be a reason for the U_R50 discrepancy.
A broad note: a general trend I see is that the smaller the batch size, the less training that can be done without hurting U_R50.

I also noticed that Hatins reported simulary poorer results when using batch_size=2.
Best,
Orr

@Rzx520
Copy link
Author

Rzx520 commented Nov 27, 2023

          > As you can see, I got the same results as the @orrzohar show in the paper. I wonder how many cards you used with batch_size = 2. I think if you use a single card, the result may be worse than I got (I used four cards with batch_size = 3) @Rzx520 . By the way, what are your final results? Are they far from the authors' results?

I used four cards with batch_size = 3,the result is :

{"train_lr": 1.999999999999943e-05, "train_class_error": 15.52755644357749, "train_grad_norm": 119.24543388206256, "train_loss": 5.189852057201781, "train_loss_bbox": 0.2700958194790585, "train_loss_bbox_0": 0.29624945830832017, "train_loss_bbox_1": 0.27978440371434526, "train_loss_bbox_2": 0.275065722955665, "train_loss_bbox_3": 0.27241891570675625, "train_loss_bbox_4": 0.27063051075218725, "train_loss_ce": 0.18834440561282928, "train_loss_ce_0": 0.27234036786085974, "train_loss_ce_1": 0.23321395799885028, "train_loss_ce_2": 0.20806531186409408, "train_loss_ce_3": 0.19453731594314128, "train_loss_ce_4": 0.18820172232765492, "train_loss_giou": 0.3351372324140976, "train_loss_giou_0": 0.3679243937037491, "train_loss_giou_1": 0.3483400315024699, "train_loss_giou_2": 0.34171414935044225, "train_loss_giou_3": 0.3379105142249501, "train_loss_giou_4": 0.3368650070453053, "train_loss_obj_ll": 0.02471167313379382, "train_loss_obj_ll_0": 0.034151954339996814, "train_loss_obj_ll_1": 0.03029250531194649, "train_loss_obj_ll_2": 0.0288731191750343, "train_loss_obj_ll_3": 0.028083207809715446, "train_loss_obj_ll_4": 0.026900355121292352, "train_cardinality_error_unscaled": 0.44506890101437985, "train_cardinality_error_0_unscaled": 0.6769398279525907, "train_cardinality_error_1_unscaled": 0.5726976196583499, "train_cardinality_error_2_unscaled": 0.4929900999093851, "train_cardinality_error_3_unscaled": 0.46150593285633223, "train_cardinality_error_4_unscaled": 0.45256225438417086, "train_class_error_unscaled": 15.52755644357749, "train_loss_bbox_unscaled": 0.054019163965779084, "train_loss_bbox_0_unscaled": 0.059249891647616536, "train_loss_bbox_1_unscaled": 0.055956880831476395, "train_loss_bbox_2_unscaled": 0.055013144572493046, "train_loss_bbox_3_unscaled": 0.054483783067331704, "train_loss_bbox_4_unscaled": 0.05412610215448962, "train_loss_ce_unscaled": 0.09417220280641464, "train_loss_ce_0_unscaled": 0.13617018393042987, "train_loss_ce_1_unscaled": 0.11660697899942514, "train_loss_ce_2_unscaled": 0.10403265593204704, "train_loss_ce_3_unscaled": 0.09726865797157064, "train_loss_ce_4_unscaled": 0.09410086116382746, "train_loss_giou_unscaled": 0.1675686162070488, "train_loss_giou_0_unscaled": 0.18396219685187454, "train_loss_giou_1_unscaled": 0.17417001575123495, "train_loss_giou_2_unscaled": 0.17085707467522113, "train_loss_giou_3_unscaled": 0.16895525711247505, "train_loss_giou_4_unscaled": 0.16843250352265265, "train_loss_obj_ll_unscaled": 30.889592197686543, "train_loss_obj_ll_0_unscaled": 42.68994404527915, "train_loss_obj_ll_1_unscaled": 37.86563257517548, "train_loss_obj_ll_2_unscaled": 36.09139981038161, "train_loss_obj_ll_3_unscaled": 35.10401065181873, "train_loss_obj_ll_4_unscaled": 33.62544476769816, "test_metrics": {"WI": 0.05356004827184098, "AOSA": 5220.0, "CK_AP50": 58.3890380859375, "CK_P50": 25.75118307055908, "CK_R50": 71.51227713815234, "K_AP50": 58.3890380859375, "K_P50": 25.75118307055908, "K_R50": 71.51227713815234, "U_AP50": 2.7862398624420166, "U_P50": 0.409358215516747, "U_R50": 16.530874785591767}, "test_coco_eval_bbox": [14.451444625854492, 14.451444625854492, 77.8148193359375, 57.15019607543945, 66.93928527832031, 49.282108306884766, 27.985671997070312, 70.54130554199219, 55.28901290893555, 82.7206039428711, 26.307403564453125, 65.15182495117188, 21.9127197265625, 77.91541290283203, 73.61457061767578, 67.8846206665039, 49.1287841796875, 36.78118896484375, 69.1879653930664, 53.060150146484375, 79.1402359008789, 59.972835540771484, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.7862398624420166], "epoch": 40, "n_parameters": 39742295}

the authors' results is : U-R:19.4,K-AP:59.5 Why is it that the author's performance cannot be achieved? @Hatins @orrzohar

Originally posted by @Rzx520 in #26 (comment)

What I tried at the beginning was batch size=3, and the results are shown above.Setting batch size to 2 is due to the parameter settings of OW-DETR.@orrzohar

@Rzx520
Copy link
Author

Rzx520 commented Nov 27, 2023

I have some gains now, which is that when I set lr to 1e-4, lr_ When lr_drop=35 and batch size=3, there are some gains, but K_ AP only reached 58.3, not 59.4. Can you provide some suggestions?

################ Deformable DETR ################
parser.add_argument('--lr', default=1e-4, type=float)
parser.add_argument('--lr_backbone_names', default=["backbone.0"], type=str, nargs='+')
parser.add_argument('--lr_backbone', default=2e-5, type=float)
parser.add_argument('--lr_linear_proj_names', default=['reference_points', 'sampling_offsets'], type=str, nargs='+')
parser.add_argument('--lr_linear_proj_mult', default=0.1, type=float)
#parser.add_argument('--batch_size', default=5, type=int)
parser.add_argument('--batch_size', default=3, type=int)
#parser.add_argument('--batch_size', default=2, type=int)
parser.add_argument('--weight_decay', default=1e-4, type=float)
parser.add_argument('--epochs', default=51, type=int)
#parser.add_argument('--lr_drop', default=30, type=int)
parser.add_argument('--lr_drop', default=35, type=int)
#parser.add_argument('--lr_drop', default=40, type=int)

parser.add_argument('--lr_drop_epochs', default=None, type=int, nargs='+')
parser.add_argument('--clip_max_norm', default=0.1, type=float,
                    help='gradient clipping max norm')
parser.add_argument('--sgd', action='store_true')

image
image

@orrzohar
Copy link
Owner

Hi @Rzx520,
Are you still using 4 x Titan RTX ?
Generally, to get higher K_AP50, you need to train for longer, but the longer you train U_R goes down. The trick is to hit the balance between the two. ?
Looking at your chart, I think you can reduce lr_drop to 30 as the last 5 epochs are saturated before the lr_drop. This will give you 5 additional epochs with the lower learning rate and will hopefully improve the results.

To clarify, to run this experiment you DO NOT need to restart from scratch -- your model should have saved the checkpoint for epoch 30 and then you only need to train for the last 10 epochs after the lr_drop. Just make sure the lr is indeed lowered.

Best,
Orr

@Rzx520
Copy link
Author

Rzx520 commented Nov 27, 2023

I am trying lr_drop=30, I will present the results here.@orrzohar

@orrzohar
Copy link
Owner

Hi @Rzx520,
OK great, thank you.
Would you mind confirming what system you are using for future reproducibility on simulary systems?
Best,
Orr

@Rzx520
Copy link
Author

Rzx520 commented Nov 28, 2023

lr_drop=30 and parser.add_argument('--eval_every', default=1, type=int) @orrzohar This effect is not as good as lr_drop = 35.

image

image

Linux ubuntu 5.15.0-86-generic #96~20.04.1-Ubuntu

@orrzohar
Copy link
Owner

orrzohar commented Dec 3, 2023

Hi @Rzx520,
OK I am trying to compile everything we have seen thus far:

--lr 2e-4, --lr_drop 40, --epochs 51 --batch_size 2 -> AP50=58.4, U_R=16.5
--lr 1e-4, --lr_drop 35, --epochs 51 --batch_size 3 -> AP50=58.4, U_R=19.4
--lr 1e-4, --lr_drop 30, --epochs 51 --batch_size 3 -> less good then above

Is that correct? Also, have you tried (as lr_drop 35->30 had the adverse affect):
--lr 1e-4, --lr_drop 40, --epochs 51 --batch_size 3

Best,
Orr

@Rzx520
Copy link
Author

Rzx520 commented Dec 5, 2023

yes,I do.
--lr 1e-4, --lr_drop 30, --epochs 41 --batch_size 3
image

AP50=58.1 U_R=19.5

@orrzohar There's one issue, the above results epochs are not default values, but 41.

@orrzohar
Copy link
Owner

orrzohar commented Dec 17, 2023

Hi @Rzx520,

Are the results above for:
--lr 1e-4, --lr_drop 40, --epochs 51 --batch_size 3?

And of course the hyper parameters are changed -- you changed the batch size as it did not fit on your GPUs. This will change other hyper parameters.
Best,
Orr

@orrzohar
Copy link
Owner

Hi @Rzx520,
I am closing this for now. If you can confirm what configuration you used to get the best results, I will add this to the README.
Best,
Orr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants