Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImageGenerator for multiple inputs #3386

Closed
jagiella opened this issue Aug 3, 2016 · 22 comments
Closed

ImageGenerator for multiple inputs #3386

jagiella opened this issue Aug 3, 2016 · 22 comments

Comments

@jagiella
Copy link

jagiella commented Aug 3, 2016

I have built a model which constists of two branches which are then merged into a single one. For the training of the model I would like to use the ImageGenerator to augement the image data, but don't know how to make work for the mixed input type. Does anybody have an idea how to deal with this in keras?
Any help would be highly appreciated!

Best,
Nick

MODEL
The first branchen takes images as inputs:

img_model = Sequential()
img_model.add(Convolution2D( 4, 9,9, border_mode='valid', input_shape=(1, 120, 160)))
img_model.add(Activation('relu'))
img_model.add(MaxPooling2D(pool_size=(2, 2)))
img_model.add(Dropout(0.5))
img_model.add(Flatten()) 

The second branch takes auxiliary data as input:

aux_model = Sequential()
aux_model.add(Dense(3, input_dim=3))

Then those get merged into the final model:

model = Sequential()
model.add(Merge([img_model, aux_model], mode='concat'))
model.add(Dropout(0.5))
model.add(Dense(5))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy'])

TRAINING / PROBLEM:
I tried to do the following which obviously failed:

datagen = ImageDataGenerator(
            featurewise_center=False,  # set input mean to 0 over the dataset
            samplewise_center=False,  # set each sample mean to 0
            featurewise_std_normalization=False,  # divide inputs by std of the dataset
            samplewise_std_normalization=False,  # divide each input by its std
            zca_whitening=False,  # apply ZCA whitening
            rotation_range=10, #180,  # randomly rotate images in the range (degrees, 0 to 180)
            width_shift_range=0.1,  # randomly shift images horizontally (fraction of total width)
            height_shift_range=0.1,  # randomly shift images vertically (fraction of total height)
            horizontal_flip=False,  # randomly flip images
            vertical_flip=False)  # randomly flip images

model.fit_generator( datagen.flow( [X,I], Y, batch_size=64),
               samples_per_epoch=X.shape[0],
               nb_epoch=20,
               validation_data=([Xval, Ival], Yval))

This produces the following error message:

Traceback (most recent call last):
  File "importdata.py", line 139, in <module>
    model.fit_generator( datagen.flow( [X,I], Y, batch_size=64),
  File "/usr/local/lib/python3.5/dist-packages/keras/preprocessing/image.py", line 261, in flow
    save_to_dir=save_to_dir, save_prefix=save_prefix, save_format=save_format)
  File "/usr/local/lib/python3.5/dist-packages/keras/preprocessing/image.py", line 454, in __init__
    'Found: X.shape = %s, y.shape = %s' % (np.asarray(X).shape, np.asarray(y).shape))
  File "/usr/local/lib/python3.5/dist-packages/numpy/core/numeric.py", line 482, in asarray
    return array(a, dtype, copy=False, order=order)
ValueError: could not broadcast input array from shape (42700,1,120,160) into shape (42700)
@fchollet
Copy link
Member

fchollet commented Aug 3, 2016

You need a generator that yields something of the form ([x1, x2], y). So you need to write your own generator, for which you can reuse the original ImageDataGenerator for one or more input.

@fchollet fchollet closed this as completed Aug 3, 2016
@jagiella
Copy link
Author

jagiella commented Aug 4, 2016

That is what I like to do, but I don't really know how to create one which will give proper results. One issue I see is for example related to shuffling. If I would use the original ImageDataGenerator with shuffling, I somehow would need to know for each image the corresponding index in the original image stack.

@jagiella
Copy link
Author

jagiella commented Aug 4, 2016

Ok, I made it work! For anybody asking himself the same question here is my example solution:

def createGenerator( X, I, Y):

    while True:
        # suffled indices    
        idx = np.random.permutation( X.shape[0])
        # create image generator
        datagen = ImageDataGenerator(
                featurewise_center=False,  # set input mean to 0 over the dataset
                samplewise_center=False,  # set each sample mean to 0
                featurewise_std_normalization=False,  # divide inputs by std of the dataset
                samplewise_std_normalization=False,  # divide each input by its std
                zca_whitening=False,  # apply ZCA whitening
                rotation_range=10, #180,  # randomly rotate images in the range (degrees, 0 to 180)
                width_shift_range=0.1, #0.1,  # randomly shift images horizontally (fraction of total width)
                height_shift_range=0.1, #0.1,  # randomly shift images vertically (fraction of total height)
                horizontal_flip=False,  # randomly flip images
                vertical_flip=False)  # randomly flip images

        batches = datagen.flow( X[idx], Y[idx], batch_size=64, shuffle=False)
        idx0 = 0
        for batch in batches:
            idx1 = idx0 + batch[0].shape[0]

            yield [batch[0], I[ idx[ idx0:idx1 ] ]], batch[1]

            idx0 = idx1
            if idx1 >= X.shape[0]:
                break

@jockes60
Copy link

Here's a piece of code that formats the outputs of two generators. It can be extended to any number of generators. Assuming the output of both generators is of the form (x,y) and the wanted output is of the form ([x1, x2], y1):

def format_gen_outputs(gen1,gen2):
    x1 = gen1[0]
    x2 = gen2[0]
    y1 = gen1[1]
    return [x1, x2], y1

combo_gen = map(format_gen_outputs, gen1, gen2)

@ahmedhosny
Copy link

@jagiella I have a similar structure but instead of one datagen.flow, I have three from three different sources. My problem is I want to make sure the same set of augmentations is applied to arrays of the same index across all three batches. Any ideas? I think the seed argument in datagen.flow is for shuffling only.

@drorhilman
Copy link

drorhilman commented Jun 15, 2017

I am using a lightly different variation...

generator = ImageDataGenerator(rotation_range=90, 
                                   width_shift_range=0.05, 
                                   height_shift_range=0.05,
                                   zoom_range=0.1)

def generate_data_generator_for_two_images(X1, X2, Y):
    genX1 = generator.flow(X1,Y, seed=7)
    genX2 = generator.flow(X2, seed=7)
    while True:
            X1i = genX1.next()
            X2i = genX2 .next()
            yield [X1i[0], X2i ], X1i[1]

@zyavrik
Copy link

zyavrik commented Oct 15, 2017

I get the following error when using the function below:

File "data_utils.py", line 569, in data_generator_task
generator_output = next(self._generator)
TypeError: 'function' object is not an iterator

trainDataGenerator = ImageDataGenerator(...)
trainGeneratorBasic = trainDataGenerator.flow(input, inputLabels)

def trainGenerator():
    while True:
        xy = trainGeneratorBasic.next()
        yield [xy[0], xy[0], xy[0]], xy[1]

UPDATE

Fixed with the following code:

def trainGeneratorFunc():
    while True:
        xy = trainGeneratorBasic.next()
        yield [xy[0], xy[0], xy[0]], xy[1]

trainGenerator = trainGeneratorFunc()

@tenbabagu
Copy link

I have a similar question: I want to use the triplet loss, so I need three image, two different ones from the same class, and the other different one from another class. Did anyone do the similar work?

@tlatlbtle
Copy link

tlatlbtle commented Dec 29, 2017

@jagiella I use this peice of code, however it shows the error message:
`
~/anaconda3/lib/python3.6/site-packages/keras/utils/data_utils.py in data_generator_task()
633 try:
634 if self._use_multiprocessing or self.queue.qsize() < max_queue_size:
--> 635 generator_output = next(self._generator)
636 self.queue.put((True, generator_output))
637 else:

ValueError: generator already executing
`
Could you find that what's wrong with it?

@DNXie
Copy link

DNXie commented Jan 22, 2018

@drorhilman
I'm using your method like this:
parallel_model.fit_generator(generate_data_generator_for_two_images(xs_train, xl_train, y_train), epochs=epochs,steps_per_epoch=int(np.ceil(xs_train.shape[0] / float(batch_size))), validation_data=([xs_test,xl_test], y_test), class_weight = 'auto', workers=4)

and got this error:
Exception in thread Thread-5: Traceback (most recent call last): File "/usr/lib/python3.4/threading.py", line 920, in _bootstrap_inner self.run() File "/usr/lib/python3.4/threading.py", line 868, in run self._target(*self._args, **self._kwargs) File "/usr/local/lib/python3.4/dist-packages/keras/utils/data_utils.py", line 579, in data_generator_task generator_output = next(self._generator) ValueError: generator already executing

There are a lot of similar methods just like yours. I tried most of them and got similar error message.
Why does this happen? How to fix this?

@DNXie
Copy link

DNXie commented Jan 22, 2018

@jockes60
Oh my god! This works! Thanks a lot!!!

@FrancisYizhang
Copy link

@fchollet
I have multiple outputs and it run successfully when the workers=1, however, it will become brocken when the workers are larger than 1 and the error is showed below:
Exception in thread Thread-7:

Traceback (most recent call last):
File "C:\Users\Francis_161014\AppData\Local\conda\conda\envs\tensorflow\lib\threading.py", line 914, in _bootstrap_inner
self.run()
File "C:\Users\Francis_161014\AppData\Local\conda\conda\envs\tensorflow\lib\threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\Francis_161014\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\keras\utils\data_utils.py", line 560, in data_generator_task
generator_output = next(self._generator)
ValueError: generator already executing
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.2\helpers\pydev_pydevd_bundle\pydevd_exec2.py", line 3, in Exec
exec(exp, global_vars, local_vars)
File "", line 12, in
File "C:\Users\Francis_161014\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\keras\legacy\interfaces.py", line 87, in wrapper
return func(*args, **kwargs)
File "C:\Users\Francis_161014\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\keras\engine\training.py", line 1809, in fit_generator
generator_output = next(output_generator)
StopIteration

The following is my code:
` generator_train = ImageDataGenerator(
featurewise_center=False,
featurewise_std_normalization=False,
rotation_range=10,
width_shift_range=0.1,
height_shift_range=0.1)
generator_train.fit(x_train)

model.fit_generator(
    generate_data_generator(generator=generator_train,
                            X=x_train,
                            Y=y_train,
                            batch_size=batch_size,
                            num_classes=num_classes),
    steps_per_epoch=len(x_train) // batch_size,
    epochs=epochs,
    validation_data=(x_test, y_test),
    callbacks=callbacks,
    workers=4,
    verbose=2)`

@FrancisYizhang
Copy link

@DNXie
How about is your program when the workers are larger than 1?

@FrancisYizhang
Copy link

My system is win7

@DNXie
Copy link

DNXie commented Jan 27, 2018

@FrancisYizhang My code above has workers=4

@FrancisYizhang
Copy link

@DNXie
Do you use multiple processor?
use_multiprocessing=True

@MjdMahasneh
Copy link

Could anyone of you guys kindly help me solve this problem #10499. I tried implementing the same generator as in this post but i dont seem to figure out where is my mistake. any help is very much appreciated.

@ad12
Copy link

ad12 commented Jul 25, 2018

@ahmedhosny did you ever find a solution for applying the same transform to the images in the same index in the two different arrays?

@gledsonmelotti
Copy link

Hello guys, how are you? I have a doubt. I will train input sets on the same network, for example, model1 receives input X1 (three folders containing classes and each class has the training, validation, and test data) and model2 receives input X2 (three folders containing three classes and each class has the training, validation and test data). Then I will concatenate a convolution of model X1 with one of model X2. So I have two input data, two validation data and two test data. My question is about the following command: steps_per_epoch = nb_train_samples // batchsize. I would like to know if my nb_train_samples is the sum of only the training_class1_X1 + training_class2_X1 + training_class3_X1 or if nb_train_samples is the sum of (training_class1_X1 + training_class2_X1 + training_class3_X1) + (training_class1_X2 + training_class2_X2 + training_class3_X2).

@TheStoneMX
Copy link

TheStoneMX commented Oct 7, 2019

found it on the internet, dont remeber where....

def two_image_generator(generator,
df,
directory,
batch_size,
x_col = 'filename',
y_col = None,
model = None,
shuffle = False,
img_size1 = (224, 224),
img_size2 = (299,299)):

gen1 = generator.flow_from_dataframe(
    df,
    directory,
    x_col = x_col,
    y_col = y_col,
    target_size = img_size1,
    class_mode = model,
    batch_size = batch_size,
    shuffle = shuffle,
    seed = 1)

gen2 = generator.flow_from_dataframe(
    df,
    directory,
    x_col = x_col,
    y_col = y_col,
    target_size = img_size2,
    class_mode = model,
    batch_size = batch_size,
    shuffle = shuffle,
    seed = 1)

while True:
    X1i = gen1.next()
    X2i = gen2.next()
    if y_col:
        yield [X1i[0], X2i[0]], X1i[1]  #X1i[1] is the label
    else:
        yield [X1i, X2i]

@gledsonmelotti
Copy link

@TheStoneMX good ideia. Thank you very much.

@lamba92
Copy link

lamba92 commented Mar 3, 2020

but what if your ImageGenerator adds augmented data and you need to match the right features for that image? How do I know which image belongs to which row of my dataframe?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests