Are samples used for warmup training and gradient calculation the same? #5

ZigeW · 2024-03-01T07:52:00Z

Hi,

I'm trying to run experiments following the instructions given in README.

I find that in Step 1 warmup training, 5% of samples are randomly selected to train $M_S$. But in Step 2 Building the gradient datastore, the selected samples used to calculate gradients seem to be fixed as the first 200 samples of each dataset.

This makes me confused about whether the samples used for warmup training and gradient calculation should be the same, can you kindly explain it to me?

xiamengzhou · 2024-03-12T01:47:08Z

Hi sorry for the late reply!

In the first step we use 5% of the full dataset to perform warmup training to get the Adam optimizer states. When calculating the gradients in the second step, you should use the full dataset, including the data used for warmup training. Let me know if you have more questions!

xiamengzhou closed this as completed Jun 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are samples used for warmup training and gradient calculation the same? #5

Are samples used for warmup training and gradient calculation the same? #5

ZigeW commented Mar 1, 2024

xiamengzhou commented Mar 12, 2024

Are samples used for warmup training and gradient calculation the same? #5

Are samples used for warmup training and gradient calculation the same? #5

Comments

ZigeW commented Mar 1, 2024

xiamengzhou commented Mar 12, 2024