-
-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use same data for training and validation? #2
Comments
Hi, @ahangchen . |
I can't agree that this doesn't effect the final result(the comparison accuracy)... The comparison is the result of feature extraction of two images. One of this extraction with familiar data will lead to higher accuracy of comparison while extraction with unseen data may lead to uncertain result and get lower accuracy. Limitation of the validation data is a problem, but I prefer to reuse some of the validation data rather than reuse training data. |
@ahangchen First of all, thank you. It is considered to be a validation bug. But we think the implicit result that we used presently also reflects the code running correctly. Because the another part is validation data which is blind to the model. Furthermore, we do not really care validation classification since we test on a retrieval problem. |
@ahangchen But if we fix this, we need to ensure there are two validation images from same ID. We need to add some extra code to ensure this. I still think we do not really care about validation. To keep code simple, I decide to keep this version of validation. Besides, I still think the ideal curve should be similar to the validation curve now, if we train the model in the right way. So it is fine to use the present version. Anyway, your suggestion is nice! Thank you very much. |
@layumi Got it. Thank you for your reply. 😄 |
Because of the same operation to image's
set
attribution that functionrand_same_class
and functionrand_diff_class
use in training and validation, the model validates with almost the same data from training, which will lead to unreliableaccuracy
.Detail:
In
cnn_train_day.m
, line98~99in each epoch,
processEpoch
will do training with opts containing data indexes whose set == 1;the
key point
is, inprocessEpoch
, line 206both of training and validation use function
getBatch
to generate inputs data.and in function
getBatch
intrain_id_net_res_2stream.m
, line51~57:and in
rand_same_class.m
line 5~8filter out image whose set is 2, meaning that function
rand_same_class
doesn't producetest data
invalidation
as well as rand_diff_class.Summary
Because of the same operation that function
rand_same_class
and functionrand_diff_class
use in training and validation, the model validates with almost the same data from training, which will lead to unreliableaccuracy
.Fix Advice
Add a parameter referred to the eval mode to function rand_same_class and rand_diff_class, and filter out image data whose set is 2 in training, filter out image data whose set is 1 in validation. If you confirm this bug, I can post a pull request to fix it.
The text was updated successfully, but these errors were encountered: