-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training problem when epoch changing #7
Comments
i have the same problem |
@ltcs11 @muyoucun thanks for point out the bug. It likely caused by dataset.shuffle, the reason is that (total samples) % (shuffle buffer size) = remainder, if remainder far away smaller than suffer buffer size that dataset.shuffle will repeat it until equal to buffer size, so the end of every epoch will lead to overfiting,and increase the loss when beginning to train next epoch.(refer to https://stackoverflow.com/questions/46928328/why-training-loss-is-increased-at-the-beginning-of-each-epoch) |
can you provide the hyperparameters you set,my inference loss doesn't get converge |
did you just annotate the dataset.shuffle? i used the dataset it provided,i don't know weather it has disordered |
the provided dataset is in order by the label number i just use the default hyperparameters, and i didn't get the results as 99.2+ either |
i have 5822653 images and i use shuffle buffer size 35504, so the remainder will be 5822653 % 35504 = 35501 . maybe i should split one tfrecord file into many small size files. |
|
insightface author has updated his datasets. you can download from his github. i test it. like this: a = np.array([1,2,3.....,59]) and i print the el every time. one example like this |
i follow your advice. and find that if i annotate the dataset.shuffle, an error occur. |
No description provided.
The text was updated successfully, but these errors were encountered: