New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add validation_split to fit_generator #3900 #9745

Closed
wants to merge 2 commits into
base: master
from

Conversation

Projects
None yet
2 participants
@kouml
Copy link
Contributor

kouml commented Mar 24, 2018

This is an alternative solution for adding validation_split to fit_generator(#3900 ).
I made utility which can split train and validation from original_dir with validation_split argument.

I'd like to get your feedback.
If it is not fit keras API, please close.

train_dir, val_dir = keras.utils.data_utils.train_valid_split(original_dir, validation_split=0.1)
# all data in train_dir which are alias to original_data.
# and train_dir is a temporary directory.

train_datagen = ImageDataGenerator(
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)

val_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
        train_dir,
        target_size=(150, 150),
        batch_size=32,
        class_mode='binary')

val_generator = val_datagen.flow_from_directory(
       val_dir,
        target_size=(150, 150),
        batch_size=32,
        class_mode='binary')
@fchollet

This comment has been minimized.

Copy link
Collaborator

fchollet commented Mar 26, 2018

Thanks for the PR. This API looks fine, but since we have already merged this feature in the API, we will not pursue this alternative.

In terms of implementation, the use of os.symlink on a lot of files seems risky (and the user may not always have the rights to create them, since it is relatively common to read files from a shared read-only directory). The symlinks are also seemingly never destroyed?

@kouml

This comment has been minimized.

Copy link
Contributor Author

kouml commented Mar 27, 2018

Thanks for the review.
I think this implementation has a benefit that can customize for each generator.
In my opinion, it can be easily handling depending on whether or not you keep tempfile object.
however, since other solution is already merged, this PR should close.

is usage like this?

datagen = ImageDataGenerator(validation_split=0.2)
t = datagen.flow_from_directory(
    'cifar',
    target_size=(224, 224, 3),
    subset='training'
    )

v = datagen.flow_from_directory(
    'cifar',
    target_size=(224, 224, 3),
    subset='validation'
    )
@fchollet

This comment has been minimized.

Copy link
Collaborator

fchollet commented Mar 27, 2018

Yes, that's correct.

Closing the PR then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment