Confirmation for some training details #7

strongwolf · 2020-06-16T06:35:38Z

Hi. I want to confirm some details in the second self-training stage. Are all the hyper-parameters (including the batch size, threshold for positive and negative, number of proposals in the RCNN head, etc. ) are the same for both supervised and unsupervised loss? Also, the unsupervised loss is imposed on both RPN and RCNN head? Thanks.

zizhaozhang · 2020-06-17T06:30:36Z

Yes, all hyper-paramters for data with human labels and data with pseodo labels (predicted offline) are treated the same.

It is simply by calling the forward function multiple times given pair of them https://github.com/google-research/ssl_detection/blob/master/detection/modeling/generalized_stac_rcnn.py#L181

strongwolf · 2020-06-17T14:59:24Z

Thank you very much. I have two more questions.
First, in the first stage, the learning schedule for '1x' is still 12k, 16k 18k iterations? If so, I think the number is too large for the 1,2,5,10% dataset. There will be over 100 epochs for batch size 8.
Second, the second training stage is fine-tuned based on the first stage or trained based on Imagenet weights?

zizhaozhang · 2020-06-18T16:50:41Z

1: Yes your understanding is right, we kept the iterations for both 1st and 2nd stages. So for smaller number of labeled data, it is equal to have more epochs.

We train from imagenet weights with unlabeled data (with pseudo labels) and labeled data.

strongwolf · 2020-07-01T12:31:47Z

I have one more question. The number of unlabeled images is much larger than the labeled ones. If you have one labeled image and one unlabeled image in a batch for each GPU, one problem may arise that when the model under-fit the unlabeled data, the model has over-fitted the labeled data. I don't know whether this is an issue. When I try to reproduce the method using another framework, I found the model (10% label + 90% unlabel) performance doesn't increase a lot when decaying the learning rate at the 9th epoch. In contrast, with 10% label + 20 % unlabel setting, the model's performance will increase when decaying lr and results in a higher mAP than (10% label + 90% unlabel) .

zizhaozhang · 2020-07-02T04:29:54Z

Hi Thanks for the followup.

I am not quite sure what is the question here? underfitting or overfitting, generalization to your new framework? Or learning rate decay does not increase performance a lot? Would you mind elaborate more and seperate questions?

strongwolf · 2020-07-02T05:44:41Z

Hi Thanks for the followup.

I am not quite sure what is the question here? underfitting or overfitting, generalization to your new framework? Or learning rate decay does not increase performance a lot? Would you mind elaborate more and seperate questions?

I have trained your code and everything is fine. But when I reproduce it using another framework, some problems confuse me. The learning schedule is based on the unlabeled data. I decay lr at the 8th epoch. For the case 10% label and 90% unlabel, since the unlabeled data is 9 times of labeled data, 8 epochs for unlabeled data means 72 epochs for labeled data. I think 72 epochs are too long for 10% labeled data and the model has overfitted the 10% labeled data after 72 epochs, which I guess is the reason why the performance doesn't increase when decaying the lr. For the case 10% label and 20% unlabel, 8 epochs for unlabeled data means 16 epochs for labeled data. The performance will increase when decaying lr because 16 epochs is acceptable.

I am not sure how important the ration between the label data and unlabeled data in a batch. In the classification task, many papers claim that the batch size of unlabeled data should be larger than labeled data.

kihyuks · 2020-07-08T16:49:45Z

We decided to rely on training "steps" (e.g., 12k, 16k, 18k iterations) to determine training schedule instead of "epoch" in this work. In our experiments, we used the exact same number of training steps for 1, 2, 5, 10% labeled data settings. This might be suboptimal for certain settings, but we observe consistent performance improvement while preventing an additional effort for hyperparameter tuning.

We haven't tried to increase the size of unlabeled batch in this work due to a tight GPU memory budget, but it could be a good addition for possible performance boost.

Chrisfsj2051 · 2020-07-30T08:49:22Z

Hi @kihyuks and @zizhaozhang , it's great to see such an interesting work with remarkable results.

However, I meet some difficults in understanding training configs. According to the code, only one image is processed in a single "step". However, it seems like in stage2, a "step" contains two images (one labeled and one unlabeled). In this case, the number of training samples is steps in stage1 and steps*2 in stage2.

I would be really appreciate if you can point out whether I've correctly understood it or not.

zizhaozhang · 2020-08-01T17:10:09Z

@Chrisfsj2051 You understanding is correct, although the steps are the same, stage2 (ssl setting) view images more times than stage1.

vaslamp mentioned this issue Jul 7, 2020

dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory #8

Closed

zizhaozhang closed this as completed Aug 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confirmation for some training details #7

Confirmation for some training details #7

strongwolf commented Jun 16, 2020

zizhaozhang commented Jun 17, 2020

strongwolf commented Jun 17, 2020

zizhaozhang commented Jun 18, 2020

strongwolf commented Jul 1, 2020 •

edited

zizhaozhang commented Jul 2, 2020 •

edited

strongwolf commented Jul 2, 2020

kihyuks commented Jul 8, 2020

Chrisfsj2051 commented Jul 30, 2020

zizhaozhang commented Aug 1, 2020

Confirmation for some training details #7

Confirmation for some training details #7

Comments

strongwolf commented Jun 16, 2020

zizhaozhang commented Jun 17, 2020

strongwolf commented Jun 17, 2020

zizhaozhang commented Jun 18, 2020

strongwolf commented Jul 1, 2020 • edited

zizhaozhang commented Jul 2, 2020 • edited

strongwolf commented Jul 2, 2020

kihyuks commented Jul 8, 2020

Chrisfsj2051 commented Jul 30, 2020

zizhaozhang commented Aug 1, 2020

strongwolf commented Jul 1, 2020 •

edited

zizhaozhang commented Jul 2, 2020 •

edited