-
Notifications
You must be signed in to change notification settings - Fork 19.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cross Validation in Keras #1711
Comments
There is built-in support to hold a percentage of the data as a validation data set (validation_split param on fit). My understanding is that most people do not do true k-fold cross validation due to the computational overhead of building k models. |
This is really handy to use just before calling Keras' model.compile() fit() and predict() functions: from sklearn.cross_validation import StratifiedKFold |
Hey @mahdiman, Here is a simplified example of how to perform k-fold CV in Keras using sklearn. from sklearn.cross_validation import StratifiedKFold
def load_data():
# load your data using this function
def create model():
# create your model using this function
def train_and_evaluate__model(model, data[train], labels[train], data[test], labels[test)):
model.fit...
# fit and evaluate here.
if __name__ == "__main__":
n_folds = 10
data, labels, header_info = load_data()
skf = StratifiedKFold(labels, n_folds=n_folds, shuffle=True)
for i, (train, test) in enumerate(skf):
print "Running Fold", i+1, "/", n_folds
model = None # Clearing the NN.
model = create_model()
train_and_evaluate_model(model, data[train], labels[train], data[test], labels[test)) |
Thanks guys :) |
Is there any way to perform cross-validation when the label is a 2D image instead of a pure 1D label? I mean when the label is not a 1D vector with the size of (n, ) |
@zhipeng-fan - It's not massively clear what exactly you're describing here. Can you provide an example? |
You may want to check this for an actual example: basically, saving the model after each fold and averaging over each individual model predictions at test time |
@zhipeng-fan Stratified kfold of sklearn works only on 1d labels: |
Hi all, @KeironO thanks for your helpful comment. I have one more question and I would be very happy if you respond it as well. I am a bit confused about validation and test datasets. Will we use data[test] as the validation dataset? Is that enough to show the performance of the model? What should I do for testing the model? I split the 9 fold (train index in your example) to two groups of training and validating and keep the the last 1 fold for test, after splitting the full datasets into 10 folds (using StratifiedKfold). You may see whatever I have done below: skf = StratifiedKFold(n_splits=10, shuffle=True)
splitted_indices=skf.split(np.zeros(data.shape[0], labels))
for train, test in splitted_indices:
train_data = data[train]
train_labels = labels[train]
x_train = train_data[:int(-0.1*len(train))]
y_train = train_labels[:int(-0.1*len(train))]
x_val = train_data[int(-0.1*len(train)):]
y_val = train_labels[int(-0.1*len(train)):]
x_test = data[test]
y_test = labels[test]
model.fit(x_train, y_train, validation_data=(x_val, y_val), epochs=100, batch_size=500, callbacks=[early_stop, checkpoint])
y_pred=loaded_model.predict(x_test, verbose=0)
'''Loaded_model is the best model saved on the disk
during the training process based on the validation dataset''' So, I basically report the performance on test set by Regards, |
Hi @KeironO . |
There is one problem though: when you train using ImageDataGenerator, that flows images per batch into memory (for big datasets). In this case, how do I do a cross-validation? In the end, do I need to create 10 folders with the K subsets by myself? |
Hi @KeironO , the comment thread was quite useful. I am working on a dataset where the problem is a regression problem. The labels are continuous values. How to perform cross validation in keras for that case. If I use StratifiedKFold it raises following error. Traceback (most recent call last): |
@tharuniitk As you cant do stratification on a regression problem's groundtruth, you'll probably have to use normal KFold as outlined by someone earlier above. |
@hitzkrieg Yes, a model is inheriting all trained weights from previous fold, if it is not re-initialized! Be careful here, otherwise your cross-validation is useless! It all depends on ehat the create_model() function does. If you re-create the model by overwriting the model variable with a new initialization in each fold, you are fun. There is no way to re-initialize a model in each run in keras , so you have to re-create it in each fold, otherwise you keep training the same model which means your cross-validation is fake. |
@hitzkrieg: |
Thank you @audiofeature . KFold is working!! |
@KeironO do we have to create the model again for each fold? |
Hi! Yes, absolutely. Otherwise the model actually see all the data and
therefore could not prove anything even you obtain a good result.
…On Wed, Oct 10, 2018, 10:40 AM Abdulrahman, ***@***.***> wrote:
@KeironO <https://github.com/KeironO> do we have to create the model
again for each fold?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1711 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AVGBuN5F48eXLetJuzr9tIanX4_bcjFbks5ujgbbgaJpZM4HZowu>
.
|
@zhipeng-fan I thought fitting the model on new data is enough, but I see now, thanks :)
|
You most certainly have to do model.compile() in each fold instead of, or in addition to model=None
… Am 19.12.2018 um 11:04 schrieb helwilliams ***@***.***>:
I am trying to do cross validation of previous saved models at different epochs (also saved) to try some early stopping analysis. However when I do model = None, it still doesn't completely clear the NN , I still get weird very grey/grey white results that don't match when I look at the corresponding model run on its own and not in a for loop of models.
I am missing something else.
Thanks in advance :)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
but the question still exists: how to save the model according to the cross_validation loss? the keras save model only using "val_loss" , w/o considering about which cross it is |
Is there any built in feature in Keras that allows me to do cross validation or I have to do it myself?
The text was updated successfully, but these errors were encountered: