Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Class Error Problem #26

Closed
Hatins opened this issue Jun 21, 2023 · 43 comments
Closed

Class Error Problem #26

Hatins opened this issue Jun 21, 2023 · 43 comments
Assignees

Comments

@Hatins
Copy link

Hatins commented Jun 21, 2023

Hi, @orrzohar,
I'm facing the same issue as described in input #24. I followed the installation instructions and used four TITAN (12G) GPUs to run the code. I set the batch size to 2 and reduced the learning rate from 2e-4 to 8e-5, as well as the lr_backbone from 2e-5 to 8e-6.
However, even after 2 epochs, the class_error metrics remain quite high, ranging from 80 to 100. Is this normal? If not, what could be causing this problem?

@@ADC((RG5C1$ECOJAQ_2_M
This value has been fluctuating, unstable

@orrzohar orrzohar self-assigned this Jun 22, 2023
@orrzohar
Copy link
Owner

Hi @Hatins,
have you tried using the same hyper-parameters as in OW-DETR?
They used a 4-GPU system with a batch size 2, and our class predictions follow the exact same inference scheme'
Best,
Orr

@Hatins
Copy link
Author

Hatins commented Jun 23, 2023

Hi, @orrzohar
thank you for your reply, I will try to experiment with the parameters of OW-DETR as soon as possible.

@Hatins Hatins closed this as completed Jun 26, 2023
@orrzohar
Copy link
Owner

Hi @Hatins,
As you closed this issue I assume that the class error issue has resolved.

If that is the case, could you please write the hyper parameters here for future reference?

Best,
Orr

@Hatins
Copy link
Author

Hatins commented Jun 26, 2023

Hi @orrzohar
I haven't actually tried it yet as I've been busy with other things recently. I'm closing this issue now because I don't want to disrupt the record that you've resolved all the issues on GitHub (^_^). However, once I find some time and successfully solve the problem, I will provide an explanation there.

@Sidd1609
Copy link

Sidd1609 commented Jun 26, 2023

@Hatins @orrzohar I am also having the same issue but I am only using 1 GPU (I do not have a slurm setup but do have a single system with 2GPUs)

image

I configured the hyperparameters as lr: 8e-6 and lr-backbone: 8e-8
And I get results like
person has 0 predictions.
bicycle has 0 predictions.
car has 1165400 predictions

How to resolve this issue ?

@orrzohar
Copy link
Owner

Hi @Hatins,
That is much appreciated, haha! Yeah I really hope PROB can be as reproducible as possible and hopefully an easy baseline for future methods due to its simplicity & it having very few hyper-parameters :). When you get back to this, please feel free to re-open the issue so I can help troubleshoot running PROB with a batch-size of 2.

@Sidd1609, could you try the OW-DETR hyper parameters if you use the same batch-size as them? namely:

    parser.add_argument('--lr', default=2e-4, type=float)
    parser.add_argument('--lr_backbone', default=2e-5, type=float)
    parser.add_argument('--weight_decay', default=1e-4, type=float)
    parser.add_argument('--lr_drop', default=40, type=int)
    parser.add_argument('--clip_max_norm', default=0.1, type=float,
                        help='gradient clipping max norm')
    parser.add_argument('--mask_loss_coef', default=1, type=float)
    parser.add_argument('--dice_loss_coef', default=1, type=float)
    parser.add_argument('--cls_loss_coef', default=2, type=float)
    
    parser.add_argument('--bbox_loss_coef', default=5, type=float)
    parser.add_argument('--giou_loss_coef', default=2, type=float)
    parser.add_argument('--focal_alpha', default=0.25, type=float)

OW-DETR and PROB are both exactly the DDETR model for the known classes during training (notice that the revised inference scheme is used in test-time/evaluation only, while the class_error is calculated during training and objectness plays no role here). The only other option is that you have an issue with your dataset configuration, but that seems rather unlikely.

Best,
Orr

@Hatins
Copy link
Author

Hatins commented Jun 28, 2023

Hi @orrzohar
Thanks for your explanation and conscientiousness! I found the hyper-parameters of OW-DETR are set as below:

    parser = argparse.ArgumentParser('Deformable DETR Detector', add_help=False)
    parser.add_argument('--lr', default=2e-4, type=float)
    parser.add_argument('--lr_backbone_names', default=["backbone.0"], type=str, nargs='+')
    parser.add_argument('--lr_backbone', default=2e-5, type=float)
    parser.add_argument('--lr_linear_proj_names', default=['reference_points', 'sampling_offsets'], type=str, nargs='+')
    parser.add_argument('--lr_linear_proj_mult', default=0.1, type=float)
    parser.add_argument('--batch_size', default=2, type=int)
    parser.add_argument('--weight_decay', default=1e-4, type=float)
    parser.add_argument('--epochs', default=51, type=int)
    parser.add_argument('--lr_drop', default=40, type=int)
    parser.add_argument('--lr_drop_epochs', default=None, type=int, nargs='+')
    parser.add_argument('--clip_max_norm', default=0.1, type=float,
                        help='gradient clipping max norm')

So I would like to use those parameters and test it tonight and return the results as soon as possible!
By the way, it will be nice if there is a formula to compute all the parameters according to the number of GPUs and batch size (^_^)

@Hatins Hatins reopened this Jun 28, 2023
@Sidd1609
Copy link

Hi @orrzohar thank you so much for your explanation, will definitely test it out with these hyper parameters of OW-DETR. I did notice the similarities with OW-DETR, and am little worried I might not be able to run with a batch size more than 2 (due to constraint of single GPU).

@Hatins
Copy link
Author

Hatins commented Jun 29, 2023

I'm sorry, I can't do the experiment this time due to the machine being occupied by the others. But from the results (4 epochs) I have got, I found the class error still fluctuates from 50 to 95. Is that normal? When you train, is the value decrease smoothly and don't fluctuate?

@orrzohar
Copy link
Owner

orrzohar commented Jul 4, 2023

Hi @Hatins,
Sorry for taking a bit to reply (I was OOO).

I am not sure as I never trained PROB with a system like yours. I can share the evaluation curves if that helps, and then you could tell if the mAP/U-Recall have similar trends to mine. They may be different due to the different batch_size, but hopefully should have a similar trend.

LMK how this does / if the class error issue is resolved - I'll update the codebase accordingly to support systems such as yours as this seems to be a common machine.

Best,
Orr

@orrzohar
Copy link
Owner

orrzohar commented Jul 4, 2023

Hi @Sidd1609,

That is fine - OW-DETR was trained with a batch size of 2 so this should hopefully work.

Best,
Orr

@Hatins
Copy link
Author

Hatins commented Jul 4, 2023

Hi @orrzohar
No problem, This isn't a very question. I found a machine with four Nvidia GeForce RTX 3090 GPUs and ran it for a while. It seems that I obtained relatively normal results after 13 epochs, as shown in the image below. Previously, when I used a GPU with 12GB of VRAM, the trend seemed to be similar to the graph below. However, due to some strange reasons, the server encountered an error (it kept failing at the fourth epoch), but it shouldn't be an issue with your code.
the result after 13 epochs (come from wandb):
image

the error of the machine with 12G VRAM:
IMG_1682(20230629-195606)

I will update my results after I get the epochs to finish.
In the end, I wonder if the batch size is relevant to the number of GPUs ? For example, if I set the lr = 0.0002 when I use 4 GPUs with 2 batch size, what number should I set when using 8 GPUs with 2 batch size?

@Hatins
Copy link
Author

Hatins commented Jul 5, 2023

Hi @orrzohar @Sidd1609
I think the hyper-parameters of OW-DETR are universal. The reason for keeping the parameter class_error unchanged at 100 is the lack of training time. After I train the model with 40 epochs, the trend of class_error seems normal:
image

By the way, I would like to ask about the main structure in the paper. Are mAP and U-Recall represented by the curves K_AP50 and U_R50 in wandb? If so, could you help me verify if my trend is correct? if not, could you tell which curve is the target?
mAP:
image

U-Recall
image

Interestingly, it is worth mentioning that U_R50 did not increase with the increase in training time.

OK, that's over, I think you could close the issue right now, thank you once again for your patient response. I feel incredibly fortunate to have encountered such a responsible author like you!

@orrzohar
Copy link
Owner

orrzohar commented Jul 5, 2023

Hi @Hatins,

I am happy to help! Yes, these are very similar to the curves I got on my machine:
image

I actually looked into why this happens and this has to do with PROB learning a good representation of objectness very early on (which is why U-Recall initially jumps, if you plot U-Recall inside epoch 1 you will see it increase from ~0 to 19). Then, as training progresses, it starts declining as it starts making more known object predictions, and therefore less unknown object predictions (e.g., ~U-Recall@100 goes down to ~U-Recall@80).

I will update the readme with this hyper parameter setup & machine type for future users.

If you encounter any new issues - do not hesitate to reach out,
Orr

@orrzohar
Copy link
Owner

orrzohar commented Jul 5, 2023

Hi @Sidd1609,

I am going to close this issue now - please let me know if when training on a 1-GPU system you see the same dynamics so I can update the readme to include your system (as well as details about your system, are you using a single TITAN 12G?)

Best,
Orr

@orrzohar orrzohar closed this as completed Jul 5, 2023
@Sidd1609
Copy link

Sidd1609 commented Jul 7, 2023

Hi @orrzohar
I am not able to observe the same trend as @Hatins observed, I changed some parameters from the OW-DETR setting which are as follows:

parser.add_argument('--lr', default=8e-5, type=float)
parser.add_argument('--lr_backbone_names', default=["backbone.0"], type=str, nargs='+')
parser.add_argument('--lr_backbone', default=8e-6, type=float)
parser.add_argument('--lr_linear_proj_names', default=['reference_points', 'sampling_offsets'], type=str, nargs='+')
parser.add_argument('--lr_linear_proj_mult', default=0.1, type=float)
parser.add_argument('--batch_size', default=2, type=int)
parser.add_argument('--weight_decay', default=1e-4, type=float)
parser.add_argument('--epochs', default=10, type=int)
parser.add_argument('--lr_drop', default=35, type=int)
parser.add_argument('--lr_drop_epochs', default=None, type=int, nargs='+')
parser.add_argument('--clip_max_norm', default=0.1, type=float,
help='gradient clipping max norm')

I do being observing decline in class_error and loss. However, the trend is not smooth, values keep fluctuating between 0-100 (class_error) and the loss is within 18-21 after 10 epochs.

@Hatins
Copy link
Author

Hatins commented Jul 8, 2023

Hi @Sidd1609
I think you don't need to change the lr from 2e-4 to 8e-5 and from 2e-5 to 8e-6. And the fluctuation may be normal. Have you seen the curve on wandb? Although volatile, it is still a downward trend. If you train for enough epochs, you should be able to see a clear trend from the curve. In addition, only ten epochs may not see a very obvious trend. And I noticed that you didn't put the curve on wandb, did you encounter an error when using wandb?

@Sidd1609
Copy link

Sidd1609 commented Jul 8, 2023

Hi @Hatins yes sharing you the results from wandb for 100 epochs

image

@Hatins
Copy link
Author

Hatins commented Jul 10, 2023

Hi @Sidd1609
It seems that the curve has a trend of decrease, but the shape is so strange /(ㄒoㄒ)/~~
Have you tried both the lr of 2e-4 and 8e-5? Did neither of these lr work? or do you only try the lr of 8e-5?

@orrzohar
Copy link
Owner

Hi @Hatins,

Notice that as he is using only two GPUs, his "batch size" (=batch_size*number of GPUs) is 1/2 of yours - so 80K steps for @Sidd1609 is ~40k steps for you. Also you show a grouped plot while Sidd is not - so you are showing the average of 4 such processes - perhaps thats why it looks more smooth?

@Sidd1609, how are the K/PK/CK mAP and U-Recall progessing? are they close to the figures I sent earlier? If so, I am not sure how critial the class_error is. If they are not, then I echo @Hatins question - what lr's did you try?

Best,
Orr

@Sidd1609
Copy link

Hi @Hatins and @orrzohar yes as Orr, mentioned I am using only 1 GPU to train the model. I ran the tests using the following lr's (I scaled it according to the batch size and GPU I had). Also I am running the model only for 1 experience which means I would not have unknown classes. Right now I am running it for 1+1 (2 experiences) instead of 0+2 (1 experience)

-> 8e-5
-> 2e-5

Sharing the plots for K/PK/CK mAP and U-Recall. Some plots are similar to the ones you shared.

When I try to use the model for predictions I do not have good results even with lower thresholds.

CK mAP & Recall
image

image

K mAP & Recall
image

image

U mAP & Recall
image

image

@Sidd1609
Copy link

Sidd1609 commented Jul 11, 2023

@orrzohar another doubt, does the size of images matter here, as in do you have any parameter that handles the size of images that I can modify and see?

And would it be possible for sharing a visualization script for testing the model?

@orrzohar
Copy link
Owner

Hi @Sidd1609,

I am a little confused. Your known mAP is over 80! This means you are not discussing the class_error on M-OWODB or S-OWODB, but on your custom dataset. Then, you can no longer expect the class_error to follow the same trend - as it will be heavily influenced by factors such as the dataset size, number of classes, and annotation error rate.

However, your known mAP is quite high, so you don't have a significant issue with class_error. As for U-Recall, you most likely have a bug in your code such that you have effectively 0 unknown objects annotated during evaluation.

Let me know if I am wrong and you are reporting values on M/S-OWODB, and if that is the case, then please let me know how you increased K_AP50 to be over 80.

Best,
Orr

@orrzohar orrzohar reopened this Jul 12, 2023
@Sidd1609
Copy link

Hi @orrzohar you are right those were for the custom dataset, however, I ran for MOWOD and results look the same

image

@orrzohar
Copy link
Owner

Hi @Sidd1609,

Could you please explain your concern with class_error if the known mAP is high? class_error is only to help oversee training and is not a performance metric.

As you are using a smaller batch size, of course, the class error will be more erratic. Class error is calculated per batch, so rather than displaying class_error over 8 images, have class_error over 2 images. Lets say you have 3 objects per image, so 6 objects total. If you make one prediction correct, then the class error jumps from 100 to 83%. Meanwhile, if your batch size is 8, then for a single correct prediction the class_error would jump from 100 to 96%. So I do not expect the same behavior for class_error.

HOWEVER, what is important is the known mAP (The class_error is ONLY for the known classes). If you have high known mAP, this is irrelevant.

Also, notice that 40k steps on 1 GPU are 10k steps on Hatins 4-GPU machine, so this is also quite zoomed in.
Best,
Orr

@Sidd1609
Copy link

Hi @orrzohar I see, I believed that the class error would influence the learning in some by mapping the classes that were correctly mapped to GT.

I did observe that the computation of class_error in the code being related to batch size. I understand your point and it makes sense.

Thank you so much for your help. I must say the codebase very user-friendly for extending to custom datasets really appreciate your effort and work.

Regards
Sid

@orrzohar
Copy link
Owner

Hi @Sidd1609,

Happy to help!

Just to verify, you same hyperparamers that worked for Hatins worked for you system as well? If so, I will add your system to the README

@Rzx520
Copy link

Rzx520 commented Jul 17, 2023

Hi @Hatins,
Task1 | Task2 | Task3 |Task4
I would like to know what the result was when you ufeised batch size=2 to verify after training? Can you tell me the verification results in the four tasks above?Thanks.

@Hatins
Copy link
Author

Hatins commented Jul 17, 2023

Hi @Rzx520
In fact, I only test the task1 for verification with batch=3, and the results were like these:

Hi @orrzohar @Sidd1609 I think the hyper-parameters of OW-DETR are universal. The reason for keeping the parameter class_error unchanged at 100 is the lack of training time. After I train the model with 40 epochs, the trend of class_error seems normal: image

By the way, I would like to ask about the main structure in the paper. Are mAP and U-Recall represented by the curves K_AP50 and U_R50 in wandb? If so, could you help me verify if my trend is correct? if not, could you tell which curve is the target? mAP: image

U-Recall image

Interestingly, it is worth mentioning that U_R50 did not increase with the increase in training time.

OK, that's over, I think you could close the issue right now, thank you once again for your patient response. I feel incredibly fortunate to have encountered such a responsible author like you!

@Rzx520
Copy link

Rzx520 commented Jul 17, 2023

I cannot obtain the exact value, as the author presented the results of U-Recall: 19.4 map: 59.5 on task 1. May I ask what your final predicted result is, and I would like to know if there is a significant difference?@Hatins

@Hatins
Copy link
Author

Hatins commented Jul 17, 2023

As you can see, I got the same results as the @orrzohar show in the paper. I wonder how many cards you used with batch_size = 2. I think if you use a single card, the result may be worse than I got (I used four cards with batch_size = 3) @Rzx520 . By the way, what are your final results? Are they far from the authors' results?

@Rzx520
Copy link

Rzx520 commented Jul 17, 2023

I'm sorry, but I'm still training. Due to the long training time, I would like to know if you didn't use the batch size results provided by the author because I'm afraid the batch size will have a significant impact on the results, and I don't have that many GPUs. I still want to ask, do you mean that the results when you use batch size 3 are the same as the author's? Haha@Hatins

@Hatins
Copy link
Author

Hatins commented Jul 17, 2023

Yeah, at first, I used a machine with four TITAN cards (12 G) to train, but the machine always reports errors when running, so I use another machine with four GTX3090 (24 G) to run with batch_size = 3 or 4. In my opinion, if you just use a single card with batch size = 2, you do have the potential to get a poorer result. Maybe you can ask @Sidd1609 for a reference since he was using a card with a batch size=2, which is the same as yours. @Rzx520

@Rzx520
Copy link

Rzx520 commented Jul 17, 2023

Yes, I have been reporting errors while using 4 12G GPUs. I want to know what errors you are reporting. I always make errors in the second epoch after training one epoch and stop training.@Hatins

@Hatins
Copy link
Author

Hatins commented Jul 17, 2023

@Rzx520
IMG_1760(20230629-195606)
The error is that, but I don't know how to solve it, so I changed the machine.
I suspect that there is a problem with the communication due to insufficient memory. You can use batch=1 to try to see if this is the reason. Because indeed I don't have this problem after changing the machine

@Rzx520
Copy link

Rzx520 commented Jul 17, 2023

Indeed, the same error was reported,and every time only one card stops running, which can be resolved when you use batch=1, but I think the performance may differ significantly@Hatins

@Hatins
Copy link
Author

Hatins commented Jul 17, 2023

Yes, @Rzx520 , when I use RTX3090, whatever I set the batch=3 or 4, it can work well with the same parameter in OW-DETR. And as you say, setting the batch=1 may decrease the result significantly. You know, the transformer always needs larger memory /(ㄒoㄒ)/~~.

@Sidd1609
Copy link

Hi @orrzohar yes I did use the same configurations as the OW-DETR paper and the only difference was in the batch size which I set as 2.

@Rzx520
Copy link

Rzx520 commented Jul 18, 2023

Hi @Sidd1609 , I would like to know what GPU you are using and what is your final verification result, which is significantly different from the author's? Thank you.

@Sidd1609
Copy link

Hi @Rzx520 I am using "one" RTX3090. But, please not that my results were reported for a custom dataset and not VOC which Orr had reported.

@Rzx520
Copy link

Rzx520 commented Jul 28, 2023

Hi @Hatins,

I am happy to help! Yes, these are very similar to the curves I got on my machine: image

I actually looked into why this happens and this has to do with PROB learning a good representation of objectness very early on (which is why U-Recall initially jumps, if you plot U-Recall inside epoch 1 you will see it increase from ~0 to 19). Then, as training progresses, it starts declining as it starts making more known object predictions, and therefore less unknown object predictions (e.g., ~U-Recall@100 goes down to ~U-Recall@80).

I will update the readme with this hyper parameter setup & machine type for future users.

If you encounter any new issues - do not hesitate to reach out, Orr

It is indeed like this. We can see that the U_R50 has decreased even after the training time has increased. I am quite puzzled, so why not choose the model with the highest U_R50? Haha@orrzohar

@Rzx520
Copy link

Rzx520 commented Nov 17, 2023

As you can see, I got the same results as the @orrzohar show in the paper. I wonder how many cards you used with batch_size = 2. I think if you use a single card, the result may be worse than I got (I used four cards with batch_size = 3) @Rzx520 . By the way, what are your final results? Are they far from the authors' results?

I used four cards with batch_size = 3,the result is :

{"train_lr": 1.999999999999943e-05, "train_class_error": 15.52755644357749, "train_grad_norm": 119.24543388206256, "train_loss": 5.189852057201781, "train_loss_bbox": 0.2700958194790585, "train_loss_bbox_0": 0.29624945830832017, "train_loss_bbox_1": 0.27978440371434526, "train_loss_bbox_2": 0.275065722955665, "train_loss_bbox_3": 0.27241891570675625, "train_loss_bbox_4": 0.27063051075218725, "train_loss_ce": 0.18834440561282928, "train_loss_ce_0": 0.27234036786085974, "train_loss_ce_1": 0.23321395799885028, "train_loss_ce_2": 0.20806531186409408, "train_loss_ce_3": 0.19453731594314128, "train_loss_ce_4": 0.18820172232765492, "train_loss_giou": 0.3351372324140976, "train_loss_giou_0": 0.3679243937037491, "train_loss_giou_1": 0.3483400315024699, "train_loss_giou_2": 0.34171414935044225, "train_loss_giou_3": 0.3379105142249501, "train_loss_giou_4": 0.3368650070453053, "train_loss_obj_ll": 0.02471167313379382, "train_loss_obj_ll_0": 0.034151954339996814, "train_loss_obj_ll_1": 0.03029250531194649, "train_loss_obj_ll_2": 0.0288731191750343, "train_loss_obj_ll_3": 0.028083207809715446, "train_loss_obj_ll_4": 0.026900355121292352, "train_cardinality_error_unscaled": 0.44506890101437985, "train_cardinality_error_0_unscaled": 0.6769398279525907, "train_cardinality_error_1_unscaled": 0.5726976196583499, "train_cardinality_error_2_unscaled": 0.4929900999093851, "train_cardinality_error_3_unscaled": 0.46150593285633223, "train_cardinality_error_4_unscaled": 0.45256225438417086, "train_class_error_unscaled": 15.52755644357749, "train_loss_bbox_unscaled": 0.054019163965779084, "train_loss_bbox_0_unscaled": 0.059249891647616536, "train_loss_bbox_1_unscaled": 0.055956880831476395, "train_loss_bbox_2_unscaled": 0.055013144572493046, "train_loss_bbox_3_unscaled": 0.054483783067331704, "train_loss_bbox_4_unscaled": 0.05412610215448962, "train_loss_ce_unscaled": 0.09417220280641464, "train_loss_ce_0_unscaled": 0.13617018393042987, "train_loss_ce_1_unscaled": 0.11660697899942514, "train_loss_ce_2_unscaled": 0.10403265593204704, "train_loss_ce_3_unscaled": 0.09726865797157064, "train_loss_ce_4_unscaled": 0.09410086116382746, "train_loss_giou_unscaled": 0.1675686162070488, "train_loss_giou_0_unscaled": 0.18396219685187454, "train_loss_giou_1_unscaled": 0.17417001575123495, "train_loss_giou_2_unscaled": 0.17085707467522113, "train_loss_giou_3_unscaled": 0.16895525711247505, "train_loss_giou_4_unscaled": 0.16843250352265265, "train_loss_obj_ll_unscaled": 30.889592197686543, "train_loss_obj_ll_0_unscaled": 42.68994404527915, "train_loss_obj_ll_1_unscaled": 37.86563257517548, "train_loss_obj_ll_2_unscaled": 36.09139981038161, "train_loss_obj_ll_3_unscaled": 35.10401065181873, "train_loss_obj_ll_4_unscaled": 33.62544476769816, "test_metrics": {"WI": 0.05356004827184098, "AOSA": 5220.0, "CK_AP50": 58.3890380859375, "CK_P50": 25.75118307055908, "CK_R50": 71.51227713815234, "K_AP50": 58.3890380859375, "K_P50": 25.75118307055908, "K_R50": 71.51227713815234, "U_AP50": 2.7862398624420166, "U_P50": 0.409358215516747, "U_R50": 16.530874785591767}, "test_coco_eval_bbox": [14.451444625854492, 14.451444625854492, 77.8148193359375, 57.15019607543945, 66.93928527832031, 49.282108306884766, 27.985671997070312, 70.54130554199219, 55.28901290893555, 82.7206039428711, 26.307403564453125, 65.15182495117188, 21.9127197265625, 77.91541290283203, 73.61457061767578, 67.8846206665039, 49.1287841796875, 36.78118896484375, 69.1879653930664, 53.060150146484375, 79.1402359008789, 59.972835540771484, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.7862398624420166], "epoch": 40, "n_parameters": 39742295}

the authors' results is :
U-R:19.4,K-AP:59.5
Why is it that the author's performance cannot be achieved?
@Hatins @orrzohar

@Rzx520
Copy link

Rzx520 commented Nov 21, 2023

As you can see, I got the same results as the @orrzohar show in the paper. I wonder how many cards you used with batch_size = 2. I think if you use a single card, the result may be worse than I got (I used four cards with batch_size = 3) @Rzx520 . By the way, what are your final results? Are they far from the authors' results?

Hi @Rzx520 In fact, I only test the task1 for verification with batch=3, and the results were like these:

By the way, I would like to ask about the main structure in the paper. Are mAP and U-Recall represented by the curves K_AP50 and U_R50 in wandb? If so, could you help me verify if my trend is correct? if not, could you tell which curve is the target? mAP: image
U-Recall image
Interestingly, it is worth mentioning that U_R50 did not increase with the increase in training time.
OK, that's over, I think you could close the issue right now, thank you once again for your patient response. I feel incredibly fortunate to have encountered such a responsible author like you!

I saw your picture showing U_ Recall is only around 18, is this before or after the change? Have you reached 19.4 since then?@Hatins
Can only 4 3090 cards achieve a performance of 18? @orrzohar Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants