Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

total numer of iterations #35

Open
holzbock opened this issue Apr 13, 2021 · 2 comments
Open

total numer of iterations #35

holzbock opened this issue Apr 13, 2021 · 2 comments

Comments

@holzbock
Copy link

If train the model with multiple GPU's the total batch size becomes bigger (batch_size_total = batch_size * num_gpus) but the number of eval_steps in one epoch stays equal. This causes that the number of overall iterations in the training is increased by the factor of the number of GPU's. In the original Tensorflow implementation the number of overall iterations is independent from the number of the GPU's and the batch is divided to the different GPU's.
I'm not 100% sure about this but if it's right the number of eval_steps in one epoch should be reduced or the batch should be divided to the GPU's so that the number of overall iterations stays constant when using multiple GPU's.

@mails2amit
Copy link

I too have the question on number of iteration. I understand that number of eval steps should be calculated dynamically by below formula.
number of eval_steps(iterations) = total number of training images/number of images in a batch
For 4000 images, number of eval steps = 4000/64 = 63 iteration in each epoch

But I do not achieve the accuracy as mentioned if I define the eval steps as 63 and epoch is 1000.

@bryanwong17
Copy link

May I know how to set coefficient of unlabeled batch size (mu) & eval_steps properly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants