Support python generators #984

junyongyou · 2020-02-18T13:55:34Z

I have a simple task to find the best CNN architecture for image regression. However, I have a large dataset, which cannot be loaded into memory at one time. It seems in the current release ImageRegressor only supports fit method requiring all the data (x and y) loaded in memory. How can I use generator in Autokeras? I have checked a closed issue #204, but it seems it was not solved.

I have already tried the tf.dataset by converting my generator to tf.dataset, but it didn't work. For example,

    dataset = tf.data.Dataset.from_generator(generate_batch, (tf.float32, tf.float32))
    vq_predictor = ak.ImageRegressor()
    for i, (X, y) in enumerate(dataset):
        X_dataset = tf.data.Dataset.from_tensors(X)
        y_dataset = tf.data.Dataset.from_tensors(y)
        vq_predictor.fit(X_dataset, y_dataset, validation_split=0.2)

Then I got error:

File "C:\Users\junyong\AppData\Local\Continuum\anaconda3\envs\tensorflow2\lib\site-packages\autokeras\tasks\image.py", line 222, in fit
**kwargs)
File "C:\Users\junyong\AppData\Local\Continuum\anaconda3\envs\tensorflow2\lib\site-packages\autokeras\auto_model.py", line 231, in fit
validation_split=validation_split)
File "C:\Users\junyong\AppData\Local\Continuum\anaconda3\envs\tensorflow2\lib\site-packages\autokeras\auto_model.py", line 313, in _prepare_data
dataset, validation_data = utils.split_dataset(dataset, validation_split)
File "C:\Users\junyong\AppData\Local\Continuum\anaconda3\envs\tensorflow2\lib\site-packages\autokeras\utils.py", line 69, in split_dataset
raise ValueError('The dataset should at least contain 2 '
ValueError: The dataset should at least contain 2 instances to be split.

Any suggestions are highly appreciated.

The text was updated successfully, but these errors were encountered:

haifeng-jin · 2020-02-19T17:02:50Z

Currently, we have not tested with generators yet.
I will look into this.
It seems it detected the dataset as less than 2 instances inside.
I assume it has more.

You may try providing the validation_data yourself.
It won't have this issue but may have some other errors.

ciberger · 2020-05-05T12:33:56Z

Description

Hi! I'm dealing with the same situation where I want to use a generator to read from a massive training set. If generators are not tested, can you recommend an alternative approach?

Thanks in advance! 🙌

Expected Behavior

Expected to train the model using as a training set a tensorflow.Dataset.generator object.

Alternatively, convert tensorflow.Dataset.generator object to train/test set input for Autokeras.

Reproducible code

import autokeras as ak
import tensorflow as tf
import pathlib

# Define general parameters
BATCH_SIZE = 32
IMG_HEIGHT = 150
IMG_WIDTH = 150

data_dir = tf.keras.utils.get_file(
    'flower_photos',
    'https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',
    untar=True)

data_dir = pathlib.Path(data_dir)

CLASS_NAMES = np.array([item.name for item in data_dir.glob('*') if item.name != "LICENSE.txt"])
# array(['roses', 'sunflowers', 'daisy', 'dandelion', 'tulips'], dtype='<U10')


# ImageDataGenerator using Keras
train_gen = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255)

# Define the parameters
train_data_gen = train_gen.flow_from_directory(
    batch_size=BATCH_SIZE,
    directory=data_dir,
    shuffle=True,
    target_size=(IMG_HEIGHT, IMG_WIDTH),
    class_mode='categorical'
)

# Wrapping Keras generator
train_ds = tf.data.Dataset.from_generator(
    lambda: train_data_gen,
    output_types=(tf.float32, tf.float32),
    output_shapes = ([BATCH_SIZE,IMG_HEIGHT,IMG_WIDTH,3],
                     [BATCH_SIZE,len(CLASS_NAMES)]))
# Found 3670 images belonging to 5 classes.

images, labels = next(train_data_gen)
print(images.dtype, images.shape)
print(labels.dtype, labels.shape)
# float32(32, 150, 150, 3)
# float32(32, 5)

# Initialize the image classifier.
clf = ak.ImageClassifier(max_trials=2)
# Feed the image classifier with training data.
clf.fit(train_ds, epochs=2)

Error message

See console output (Click to expand)

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
 in 
      2 clf = ak.ImageClassifier(max_trials=2)
      3 # Feed the image classifier with training data.
----> 4 clf.fit(train_ds, epochs=2)

~/.pyenv/versions/python-3.7.4/lib/python3.7/site-packages/autokeras/tasks/image.py in fit(self, x, y, epochs, callbacks, validation_split, validation_data, **kwargs)
    120                     validation_split=validation_split,
    121                     validation_data=validation_data,
--> 122                     **kwargs)
    123 
    124 

~/.pyenv/versions/python-3.7.4/lib/python3.7/site-packages/autokeras/auto_model.py in fit(self, x, y, batch_size, epochs, callbacks, validation_split, validation_data, **kwargs)
    237             y=y,
    238             validation_data=validation_data,
--> 239             validation_split=validation_split)
    240 
    241         # Process the args.

~/.pyenv/versions/python-3.7.4/lib/python3.7/site-packages/autokeras/auto_model.py in _prepare_data(self, x, y, validation_data, validation_split)
    319         if validation_data is None and validation_split:
    320             self._split_dataset = True
--> 321             dataset, validation_data = utils.split_dataset(dataset, validation_split)
    322         return dataset, validation_data
    323 

~/.pyenv/versions/python-3.7.4/lib/python3.7/site-packages/autokeras/utils.py in split_dataset(dataset, validation_split)
     79         A tuple of two tf.data.Dataset. The training set and the validation set.
     80     """
---> 81     num_instances = dataset.reduce(np.int64(0), lambda x, _: x + 1).numpy()
     82     if num_instances < 2:
     83         raise ValueError('The dataset should at least contain 2 '

~/.pyenv/versions/python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/data/ops/dataset_ops.py in reduce(self, initial_state, reduce_func)
   1932             f=reduce_func,
   1933             output_shapes=structure.get_flat_tensor_shapes(state_structure),
-> 1934             output_types=structure.get_flat_tensor_types(state_structure)))
   1935 
   1936   def unbatch(self):

~/.pyenv/versions/python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_dataset_ops.py in reduce_dataset(input_dataset, initial_state, other_arguments, f, output_types, output_shapes, use_inter_op_parallelism, name)
   4659         pass  # Add nodes to the TensorFlow graph.
   4660     except _core._NotOkStatusException as e:
-> 4661       _ops.raise_from_not_ok_status(e, name)
   4662   # Add nodes to the TensorFlow graph.
   4663   if not isinstance(output_types, (list, tuple)):

~/.pyenv/versions/python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py in raise_from_not_ok_status(e, name)
   6604   message = e.message + (" name: " + name if name is not None else "")
   6605   # pylint: disable=protected-access
-> 6606   six.raise_from(core._status_to_exception(e.code, message), None)
   6607   # pylint: enable=protected-access
   6608 

~/.pyenv/versions/python-3.7.4/lib/python3.7/site-packages/six.py in raise_from(value, from_value)

InvalidArgumentError: ValueError: `generator` yielded an element of shape (22, 150, 150, 3) where an element of shape (32, 150, 150, 3) was expected.
Traceback (most recent call last):

  File "/Users/cristobalberger/.pyenv/versions/python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/script_ops.py", line 236, in __call__
    ret = func(*args)

  File "/Users/cristobalberger/.pyenv/versions/python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 825, in generator_py_func
    "of shape %s was expected." % (ret_array.shape, expected_shape))

ValueError: `generator` yielded an element of shape (22, 150, 150, 3) where an element of shape (32, 150, 150, 3) was expected.


	 [[{{node PyFunc}}]] [Op:ReduceDataset]

Setup Details

OS type and version: macOS Catalina (10.15)
Python: 3.7.4
autokeras: 1.0.2
keras-tuner: 1.0.1
scikit-learn: 0.22.2.post1
numpy: 1.18.3
pandas: 1.0.3
tensorflow: 2.1.0

haifeng-jin · 2020-05-16T19:14:40Z

Current we don't support generators. It seems not the main stream way to do the job in the future of tf keras. We will try to support tf records, which we haven't really tested yet.

ciberger · 2020-05-17T16:09:38Z

Hi @haifeng-jin. Thanks for your reply.

are you aware if it's possible to convert tf.data.Dataset.from_generator to a ndarray to be used as a training set?

Thanks

haruiz · 2020-05-17T16:55:42Z

@ciberger, this code seems to work, you can give a shot:

import tensorflow as tf
import numpy as np
import autokeras as ak
from tensorflow.keras.preprocessing import image
import pathlib
import matplotlib.pylab as plt

BATCH_SIZE = 32
IMG_HEIGHT = 224
IMG_WIDTH = 224
data_dir = "dataset/train"
data_dir = pathlib.Path(data_dir)
#image_count = len(list(data_dir.glob('*/*.jpg')))
#STEPS_PER_EPOCH = np.ceil(image_count/BATCH_SIZE)

def preprocess(img):
    img = image.array_to_img(img, scale=False)
    img = img.resize((IMG_WIDTH, IMG_HEIGHT))
    img = image.img_to_array(img)
    return img / 255.0

image_generator = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255,
                                              horizontal_flip=True,
                                              validation_split=0.2,
                                              preprocessing_function=preprocess)

train_generator = image_generator.flow_from_directory(
    directory=str(data_dir),
     batch_size=BATCH_SIZE,
     shuffle=True,
     #class_mode="categorical",
     target_size=(IMG_HEIGHT, IMG_WIDTH),
    subset='training'
)

val_generator = image_generator.flow_from_directory(
    directory=str(data_dir),
     batch_size=BATCH_SIZE,
     shuffle=True,
     #class_mode="categorical",
     target_size=(IMG_HEIGHT, IMG_WIDTH),
    subset='validation'
)

def callable_iterator(generator):
    for img_batch, targets_batch in generator:
        yield img_batch, targets_batch

train_dataset = tf.data.Dataset.from_generator(lambda: callable_iterator(train_generator),output_types=(tf.float32, tf.float32))
val_dataset = tf.data.Dataset.from_generator(lambda: callable_iterator(val_generator),output_types=(tf.float32, tf.float32))

# for image, label in train_dataset.take(1):
#   print("Image shape: ", image.numpy().shape)
#   print("Label: ", label.numpy())
#   plt.imshow(image.numpy()[0] * 255)
#   plt.show()


clf = ak.ImageClassifier(max_trials=10)
#Feed the tensorflow Dataset to the classifier.
clf.fit(train_dataset, epochs=60)
#Evaluate the best model.
print(clf.evaluate(val_dataset))

Then you can feed your fit function with the tf.datasets

cibic89 · 2020-05-29T23:18:44Z

@ciberger, this code seems to work, you can give a shot:

import tensorflow as tf
import numpy as np
import autokeras as ak
from tensorflow.keras.preprocessing import image
import pathlib
import matplotlib.pylab as plt

BATCH_SIZE = 32
IMG_HEIGHT = 224
IMG_WIDTH = 224
data_dir = tf.keras.utils.get_file(
  'flower_photos',
  'https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',
  untar=True)
data_dir = pathlib.Path(data_dir)
#image_count = len(list(data_dir.glob('*/*.jpg')))
#STEPS_PER_EPOCH = np.ceil(image_count/BATCH_SIZE)

def preprocess(img):
    img = image.array_to_img(img, scale=False)
    img = img.resize((IMG_WIDTH, IMG_HEIGHT))
    img = image.img_to_array(img)
    return img / 255.0

image_generator = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255,
                                              horizontal_flip=True,
                                              validation_split=0.2,
                                              preprocessing_function=preprocess)

train_generator = image_generator.flow_from_directory(
    directory=str(data_dir),
     batch_size=BATCH_SIZE,
     shuffle=True,
     #class_mode="categorical",
     target_size=(IMG_HEIGHT, IMG_WIDTH),
    subset='training'
)

val_generator = image_generator.flow_from_directory(
    directory=str(data_dir),
     batch_size=BATCH_SIZE,
     shuffle=True,
     #class_mode="categorical",
     target_size=(IMG_HEIGHT, IMG_WIDTH),
    subset='validation'
)

def callable_iterator(generator):
    for img_batch, targets_batch in generator:
        yield img_batch, targets_batch

train_dataset = tf.data.Dataset.from_generator(lambda: callable_iterator(train_generator),output_types=(tf.float32, tf.float32))
val_dataset = tf.data.Dataset.from_generator(lambda: callable_iterator(val_generator),output_types=(tf.float32, tf.float32))

for image, label in train_dataset.take(1):
  print("Image shape: ", image.numpy().shape)
  print("Label: ", label.numpy()[0])
  plt.imshow(image.numpy()[0].squeeze(axis=2) * 255)
  plt.show()

clf = ak.ImageClassifier(max_trials=10)
#Feed the tensorflow Dataset to the classifier.
clf.fit(train_dataset, epochs=60)
#Evaluate the best model.
print(clf.evaluate(val_dataset))

Then you can feed your fit function with the tf.datasets

Your example shows no output for me. I'm using google colab with GPU support and I have got it working before (albeit with bad predictive performance).

UPDATE: I know the generator works when I test it using the adapted code above
UPDATE: Tried on local machine and the same issue happens, TensorFlow 2.2.0, AutoKeras 1.0.2
UPDATE: The problem is with AutoKeras #1075

Pan-EDU · 2020-06-02T03:43:38Z

Hi @haruiz & @cibic89,

I downloaded a little mnist images stored in local folder to try your method, but it just showed the following message and then system hang:

Found 9 images belonging to 3 classes

when system hang, I could see it use a lot of CPU computing resources:

CMD %MEM %CPU
python3 script.py 0.7 117

Could you tell me how to deal with this situation?
I used tensorflow 2.1.0 and current version of autokeras.
Thank you!

haifeng-jin · 2020-06-24T04:37:08Z

Currently, we are short of hands. It would be great if anyone could try to give an example on TF record and tf.data. @naitslup has provided this example (#1060 (comment)) , which would be a good start.

cibic89 · 2020-06-24T10:05:40Z

Currently, we are short of hands. It would be great if anyone could try to give an example on TF record and tf.data. @naitslup has provided this example (#1060 (comment)) , which would be a good start.

Happy to do this.

abdulsam · 2020-07-15T17:16:29Z

@ciberger, this code seems to work, you can give a shot:

import tensorflow as tf
import numpy as np
import autokeras as ak
from tensorflow.keras.preprocessing import image
import pathlib
import matplotlib.pylab as plt

BATCH_SIZE = 32
IMG_HEIGHT = 224
IMG_WIDTH = 224
data_dir = "dataset/train"
data_dir = pathlib.Path(data_dir)
#image_count = len(list(data_dir.glob('*/*.jpg')))
#STEPS_PER_EPOCH = np.ceil(image_count/BATCH_SIZE)

def preprocess(img):
    img = image.array_to_img(img, scale=False)
    img = img.resize((IMG_WIDTH, IMG_HEIGHT))
    img = image.img_to_array(img)
    return img / 255.0

image_generator = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255,
                                              horizontal_flip=True,
                                              validation_split=0.2,
                                              preprocessing_function=preprocess)

train_generator = image_generator.flow_from_directory(
    directory=str(data_dir),
     batch_size=BATCH_SIZE,
     shuffle=True,
     #class_mode="categorical",
     target_size=(IMG_HEIGHT, IMG_WIDTH),
    subset='training'
)

val_generator = image_generator.flow_from_directory(
    directory=str(data_dir),
     batch_size=BATCH_SIZE,
     shuffle=True,
     #class_mode="categorical",
     target_size=(IMG_HEIGHT, IMG_WIDTH),
    subset='validation'
)

def callable_iterator(generator):
    for img_batch, targets_batch in generator:
        yield img_batch, targets_batch

train_dataset = tf.data.Dataset.from_generator(lambda: callable_iterator(train_generator),output_types=(tf.float32, tf.float32))
val_dataset = tf.data.Dataset.from_generator(lambda: callable_iterator(val_generator),output_types=(tf.float32, tf.float32))

# for image, label in train_dataset.take(1):
#   print("Image shape: ", image.numpy().shape)
#   print("Label: ", label.numpy())
#   plt.imshow(image.numpy()[0] * 255)
#   plt.show()


clf = ak.ImageClassifier(max_trials=10)
#Feed the tensorflow Dataset to the classifier.
clf.fit(train_dataset, epochs=60)
#Evaluate the best model.
print(clf.evaluate(val_dataset))

Then you can feed your fit function with the tf.datasets

Getting an error ValueError: Cannot take the length of shape with unknown rank. on line clf.fit(train_dataset, epochs=60)

cibic89 · 2020-07-18T14:08:56Z

I just discovered this one: https://www.tensorflow.org/api_docs/python/tf/data/Dataset#from_generator
You might be able to wrap your generator into a tf.data.Dataset with it.

@haifeng-jin Can I contribute a tutorial for using an image generator with AutoKeras?

haifeng-jin · 2020-07-21T19:46:39Z

@cibic89 Yes. Thank you! Please put it in this folder https://github.com/keras-team/autokeras/tree/master/docs/py
You may also try to run your code in colab to make sure it works.

haifeng-jin · 2020-07-21T19:48:41Z

@abdulsam Hi Abdus, it seems you have encountered a lot of issues. Can we setup a call to help you resolve the issues? You can also help us answer some of our user study questions and provide your feedback. You can reach me on slack. Thank you!

GuileCyclone · 2020-08-20T08:07:12Z

@haifeng-jin May tensorflow.keras.utils.Sequence be another way to support? It's just like torch.Dataset class, and may be easy to apply in code.I used Sequence for huge numpy array which is stored in a h5py file, it just be read to RAM when getitem is called.

haifeng-jin · 2020-08-20T19:54:11Z

@GuileCyclone Thank you! I will look that up.

MissingShoes · 2020-10-08T08:46:31Z

I just documented a problem with tf.Dataset and the StructuredDataRegressor in #766

haifeng-jin · 2020-11-02T02:15:46Z

Python generators are supported in 1.0.10 release.
Please refer to the tutorials here.
https://autokeras.com/tutorial/load/

Please let us know if it doesn't work for your case.

VictorReaver1999 · 2020-11-18T13:10:16Z

@ciberger, this code seems to work, you can give a shot:

import tensorflow as tf
import numpy as np
import autokeras as ak
from tensorflow.keras.preprocessing import image
import pathlib
import matplotlib.pylab as plt

BATCH_SIZE = 32
IMG_HEIGHT = 224
IMG_WIDTH = 224
data_dir = tf.keras.utils.get_file(
  'flower_photos',
  'https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',
  untar=True)
data_dir = pathlib.Path(data_dir)
#image_count = len(list(data_dir.glob('*/*.jpg')))
#STEPS_PER_EPOCH = np.ceil(image_count/BATCH_SIZE)

def preprocess(img):
    img = image.array_to_img(img, scale=False)
    img = img.resize((IMG_WIDTH, IMG_HEIGHT))
    img = image.img_to_array(img)
    return img / 255.0

image_generator = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255,
                                              horizontal_flip=True,
                                              validation_split=0.2,
                                              preprocessing_function=preprocess)

train_generator = image_generator.flow_from_directory(
    directory=str(data_dir),
     batch_size=BATCH_SIZE,
     shuffle=True,
     #class_mode="categorical",
     target_size=(IMG_HEIGHT, IMG_WIDTH),
    subset='training'
)

val_generator = image_generator.flow_from_directory(
    directory=str(data_dir),
     batch_size=BATCH_SIZE,
     shuffle=True,
     #class_mode="categorical",
     target_size=(IMG_HEIGHT, IMG_WIDTH),
    subset='validation'
)

def callable_iterator(generator):
    for img_batch, targets_batch in generator:
        yield img_batch, targets_batch

train_dataset = tf.data.Dataset.from_generator(lambda: callable_iterator(train_generator),output_types=(tf.float32, tf.float32))
val_dataset = tf.data.Dataset.from_generator(lambda: callable_iterator(val_generator),output_types=(tf.float32, tf.float32))

for image, label in train_dataset.take(1):
  print("Image shape: ", image.numpy().shape)
  print("Label: ", label.numpy()[0])
  plt.imshow(image.numpy()[0].squeeze(axis=2) * 255)
  plt.show()

clf = ak.ImageClassifier(max_trials=10)
#Feed the tensorflow Dataset to the classifier.
clf.fit(train_dataset, epochs=60)
#Evaluate the best model.
print(clf.evaluate(val_dataset))

Then you can feed your fit function with the tf.datasets

Your example shows no output for me. I'm using google colab with GPU support and I have got it working before (albeit with bad predictive performance).

UPDATE: I know the generator works when I test it using the adapted code above
UPDATE: Tried on local machine and the same issue happens, TensorFlow 2.2.0, AutoKeras 1.0.2
UPDATE: The problem is with AutoKeras #1075

Hi. I know this is several months late, but I am not getting any output just as you. What does it have to do with autokeras? The issue is with matplotlib since this is the library responsible for displaying the image. What am I missing? Thank you.

cibic89 · 2020-11-18T13:58:23Z

@ciberger, this code seems to work, you can give a shot:

import tensorflow as tf
import numpy as np
import autokeras as ak
from tensorflow.keras.preprocessing import image
import pathlib
import matplotlib.pylab as plt

BATCH_SIZE = 32
IMG_HEIGHT = 224
IMG_WIDTH = 224
data_dir = tf.keras.utils.get_file(
  'flower_photos',
  'https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',
  untar=True)
data_dir = pathlib.Path(data_dir)
#image_count = len(list(data_dir.glob('*/*.jpg')))
#STEPS_PER_EPOCH = np.ceil(image_count/BATCH_SIZE)

def preprocess(img):
    img = image.array_to_img(img, scale=False)
    img = img.resize((IMG_WIDTH, IMG_HEIGHT))
    img = image.img_to_array(img)
    return img / 255.0

image_generator = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255,
                                              horizontal_flip=True,
                                              validation_split=0.2,
                                              preprocessing_function=preprocess)

train_generator = image_generator.flow_from_directory(
    directory=str(data_dir),
     batch_size=BATCH_SIZE,
     shuffle=True,
     #class_mode="categorical",
     target_size=(IMG_HEIGHT, IMG_WIDTH),
    subset='training'
)

val_generator = image_generator.flow_from_directory(
    directory=str(data_dir),
     batch_size=BATCH_SIZE,
     shuffle=True,
     #class_mode="categorical",
     target_size=(IMG_HEIGHT, IMG_WIDTH),
    subset='validation'
)

def callable_iterator(generator):
    for img_batch, targets_batch in generator:
        yield img_batch, targets_batch

train_dataset = tf.data.Dataset.from_generator(lambda: callable_iterator(train_generator),output_types=(tf.float32, tf.float32))
val_dataset = tf.data.Dataset.from_generator(lambda: callable_iterator(val_generator),output_types=(tf.float32, tf.float32))

for image, label in train_dataset.take(1):
  print("Image shape: ", image.numpy().shape)
  print("Label: ", label.numpy()[0])
  plt.imshow(image.numpy()[0].squeeze(axis=2) * 255)
  plt.show()

clf = ak.ImageClassifier(max_trials=10)
#Feed the tensorflow Dataset to the classifier.
clf.fit(train_dataset, epochs=60)
#Evaluate the best model.
print(clf.evaluate(val_dataset))

Then you can feed your fit function with the tf.datasets

Your example shows no output for me. I'm using google colab with GPU support and I have got it working before (albeit with bad predictive performance).
UPDATE: I know the generator works when I test it using the adapted code above
UPDATE: Tried on local machine and the same issue happens, TensorFlow 2.2.0, AutoKeras 1.0.2
UPDATE: The problem is with AutoKeras #1075

Hi. I know this is several months late, but I am not getting any output just as you. What does it have to do with autokeras? The issue is with matplotlib since this is the library responsible for displaying the image. What am I missing? Thank you.

An AutoKeras contributor said so here.

It could be what you said; it could also be something else.
Why does it matter anyway? It works now.

VictorReaver1999 · 2020-11-19T15:19:36Z

@ciberger, this code seems to work, you can give a shot:

import tensorflow as tf
import numpy as np
import autokeras as ak
from tensorflow.keras.preprocessing import image
import pathlib
import matplotlib.pylab as plt

BATCH_SIZE = 32
IMG_HEIGHT = 224
IMG_WIDTH = 224
data_dir = tf.keras.utils.get_file(
  'flower_photos',
  'https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',
  untar=True)
data_dir = pathlib.Path(data_dir)
#image_count = len(list(data_dir.glob('*/*.jpg')))
#STEPS_PER_EPOCH = np.ceil(image_count/BATCH_SIZE)

def preprocess(img):
    img = image.array_to_img(img, scale=False)
    img = img.resize((IMG_WIDTH, IMG_HEIGHT))
    img = image.img_to_array(img)
    return img / 255.0

image_generator = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255,
                                              horizontal_flip=True,
                                              validation_split=0.2,
                                              preprocessing_function=preprocess)

train_generator = image_generator.flow_from_directory(
    directory=str(data_dir),
     batch_size=BATCH_SIZE,
     shuffle=True,
     #class_mode="categorical",
     target_size=(IMG_HEIGHT, IMG_WIDTH),
    subset='training'
)

val_generator = image_generator.flow_from_directory(
    directory=str(data_dir),
     batch_size=BATCH_SIZE,
     shuffle=True,
     #class_mode="categorical",
     target_size=(IMG_HEIGHT, IMG_WIDTH),
    subset='validation'
)

def callable_iterator(generator):
    for img_batch, targets_batch in generator:
        yield img_batch, targets_batch

train_dataset = tf.data.Dataset.from_generator(lambda: callable_iterator(train_generator),output_types=(tf.float32, tf.float32))
val_dataset = tf.data.Dataset.from_generator(lambda: callable_iterator(val_generator),output_types=(tf.float32, tf.float32))

for image, label in train_dataset.take(1):
  print("Image shape: ", image.numpy().shape)
  print("Label: ", label.numpy()[0])
  plt.imshow(image.numpy()[0].squeeze(axis=2) * 255)
  plt.show()

clf = ak.ImageClassifier(max_trials=10)
#Feed the tensorflow Dataset to the classifier.
clf.fit(train_dataset, epochs=60)
#Evaluate the best model.
print(clf.evaluate(val_dataset))

Then you can feed your fit function with the tf.datasets

Your example shows no output for me. I'm using google colab with GPU support and I have got it working before (albeit with bad predictive performance).
UPDATE: I know the generator works when I test it using the adapted code above
UPDATE: Tried on local machine and the same issue happens, TensorFlow 2.2.0, AutoKeras 1.0.2
UPDATE: The problem is with AutoKeras #1075

Hi. I know this is several months late, but I am not getting any output just as you. What does it have to do with autokeras? The issue is with matplotlib since this is the library responsible for displaying the image. What am I missing? Thank you.

An AutoKeras contributor said so here.

It could be what you said; it could also be something else.
Why does it matter anyway? It works now.

Well, I am getting output now. But clf.fit isn't working (it says ValueError: Cannot take the length of shape with unknown rank. on line clf.fit(train_dataset, epochs=60) which is the same issue that abdulsam faced. What's the workaround? The link you gave doesn't really explain much, it is a base-case tutorial.

theGOTOguy · 2020-11-27T07:43:44Z

@VictorReaver1999 I was also getting the "Cannot take the length of shape with unknown rank" error. The issue is that autokeras.utils.data_utils.batched doesn't know what to do with the generator. Since the generators in the example are indeed outputting batches, just monkey-patching this function to return True will get you up and running for now.

train_dataset = tf.data.Dataset.from_generator(lambda: callable_iterator(train_generator),output_types=(tf.float32, tf.float32))
val_dataset = tf.data.Dataset.from_generator(lambda: callable_iterator(val_generator),output_types=(tf.float32, tf.float32))

# Autokeras data_utils gets confused by the generator.
# Just let it know that the data is indeed batched.
ak.utils.data_utils.batched = lambda _: True

clf = ak.ImageRegressor(
    max_trials=args.max_trials,
    directory=args.save_model_dir)

Update (above left for posterity):

I found that there was a place downstream that was also failing in spite of the monkey-patch. As I experimented with AutoKeras, I found a cleaner way that both clears the downstream error and avoids the ugly monkey-patching:

def callable_iterator(generator, expected_batch_size):
  for img_batch, targets_batch in generator:
    if img_batch.shape[0] == expected_batch_size:
      yield img_batch, targets_batch

train_dataset = tf.data.Dataset.from_generator(
    lambda: callable_iterator(train_generator, args.batch_size),
    output_types=(tf.float32, tf.float32),
    output_shapes=(tf.TensorShape([None, args.crop_size, args.crop_size, 1 if args.gray else 3]),
                   tf.TensorShape([None, 1])))
val_dataset = tf.data.Dataset.from_generator(
    lambda: callable_iterator(val_generator, args.batch_size),
    output_types=(tf.float32, tf.float32),
    output_shapes=(tf.TensorShape([None, args.crop_size, args.crop_size, 1 if args.gray else 3]),
                   tf.TensorShape([None, 1])))

Obviously the things starting with args. should be replaced with something appropriate to your code.

This surfaces the appropriate dimensions to the necessary places in AutoKeras. AutoKeras does not behave well if you give it any incomplete batches, so I had to modify callable_iterator as above to reject incomplete batches from the iterator.

AlinaYablokova · 2020-12-04T10:20:54Z

Hello @haifeng-jin!

I have a problem in stuck model.fit execution with generators.

Versions
TensorFlow: 2.3.0
AutoKeras: 1.0.12
KerasTuner: 1.0.3

Hardware
Google Colab:
RAM 25 GB
GPU Tesla V100-SXM2-16GB

But I would like to run it on my computer:
RAM 16 GB
GeForce GTX 1060 6GB

So generators are important for me also due to hardware restrictions.

Data amount
X_train.shape - (3466413, 18, 5)
X_train_steady.shape - (3466413, 1)
Y_lat_train.shape - (3466413, 10)
Y_lon_train.shape - (3466413, 10))
X_val.shape - (323931, 18, 5)
X_val_steady.shape - (323931, 1),
Y_lat_val.shape - (323931, 10)
Y_lon_val.shape - (323931, 10)

Code

Option 1

def make_gen_callable(x_data,
                      x_data_steady,
                      y_lat_data,
                      y_lon_data,
                      batch_size):
    def generator_shuffle():
        max_index = len(x_data) - 1
        while 1:
            rows = np.random.randint(0, max_index)
            x = (x_data[rows], x_data_steady[rows])
            y = (y_lat_data[rows], y_lon_data[rows])
            yield x, y
    return generator_shuffle

batch_size = 256

train_gen = make_gen_callable(X_train,
                              X_train_steady,
                              Y_lat_train,
                              Y_lon_train,
                              batch_size)

val_gen = make_gen_callable(X_val,
                             X_val_steady,
                             Y_lat_val,
                             Y_lon_val,
                             batch_size)

output_types = ((tf.float32, tf.float32), (tf.float32, tf.float32))

output_shapes = (([18, 5], [1]), ([10], [10]))

train_dataset = tf.data.Dataset.from_generator(
    train_gen,
    output_types=output_types,
    output_shapes=output_shapes,
).batch(batch_size).repeat()

val_dataset = tf.data.Dataset.from_generator(
    val_gen,
    output_types=output_types,
    output_shapes=output_shapes,
).batch(batch_size).repeat()

Option 2

def make_gen_callable(x_data,
                      x_data_steady,
                      y_lat_data,
                      y_lon_data,
                      batch_size):
    def generator_shuffle():
        max_index = len(x_data) - 1
        while 1:
            rows = np.random.randint(0, max_index)
            x = (x_data[rows], x_data_steady[rows])
            y = (y_lat_data[rows], y_lon_data[rows])
            yield x, y
    return generator_shuffle

batch_size = 256

train_gen = make_gen_callable(X_train,
                              X_train_steady,
                              Y_lat_train,
                              Y_lon_train,
                              batch_size)

val_gen = make_gen_callable(X_val,
                             X_val_steady,
                             Y_lat_val,
                             Y_lon_val,
                             batch_size)

output_types = ((tf.float32, tf.float32), (tf.float32, tf.float32))

output_shapes = (([None, 18, 5], [None, 1]), ([None, 10], [None, 10]))

train_dataset = tf.data.Dataset.from_generator(
    train_gen,
    output_types=output_types,
    output_shapes=output_shapes,
)

val_dataset = tf.data.Dataset.from_generator(
    val_gen,
    output_types=output_types,
    output_shapes=output_shapes,
)

In both cases I did not get the results even after 24 hours.

Without generators one can see the first results in output immediately.

Could you please help me with it?

arirkts369 · 2020-12-21T08:51:59Z

@VictorReaver1999 I was also getting the "Cannot take the length of shape with unknown rank" error. The issue is that autokeras.utils.data_utils.batched doesn't know what to do with the generator. Since the generators in the example are indeed outputting batches, just monkey-patching this function to return True will get you up and running for now.
train_dataset = tf.data.Dataset.from_generator(lambda: callable_iterator(train_generator),output_types=(tf.float32, tf.float32))
val_dataset = tf.data.Dataset.from_generator(lambda: callable_iterator(val_generator),output_types=(tf.float32, tf.float32))

# Autokeras data_utils gets confused by the generator.
# Just let it know that the data is indeed batched.
ak.utils.data_utils.batched = lambda _: True

clf = ak.ImageRegressor(
    max_trials=args.max_trials,
    directory=args.save_model_dir)
Update (above left for posterity):

I found that there was a place downstream that was also failing in spite of the monkey-patch. As I experimented with AutoKeras, I found a cleaner way that both clears the downstream error and avoids the ugly monkey-patching:
def callable_iterator(generator, expected_batch_size):
  for img_batch, targets_batch in generator:
    if img_batch.shape[0] == expected_batch_size:
      yield img_batch, targets_batch

train_dataset = tf.data.Dataset.from_generator(
    lambda: callable_iterator(train_generator, args.batch_size),
    output_types=(tf.float32, tf.float32),
    output_shapes=(tf.TensorShape([None, args.crop_size, args.crop_size, 1 if args.gray else 3]),
                   tf.TensorShape([None, 1])))
val_dataset = tf.data.Dataset.from_generator(
    lambda: callable_iterator(val_generator, args.batch_size),
    output_types=(tf.float32, tf.float32),
    output_shapes=(tf.TensorShape([None, args.crop_size, args.crop_size, 1 if args.gray else 3]),
                   tf.TensorShape([None, 1])))
Obviously the things starting with args. should be replaced with something appropriate to your code.

This surfaces the appropriate dimensions to the necessary places in AutoKeras. AutoKeras does not behave well if you give it any incomplete batches, so I had to modify callable_iterator as above to reject incomplete batches from the iterator.

Hi I have 3 classes and 32 batches for some reason when I try to add this patch to the code it gives me an error

ValueError: generator yielded an element of shape (32, 3) where an element of shape (None, 1) was expected

also change the values (None,1) in the patch to (32,3), it runs but nothing happens at all.

theGOTOguy · 2020-12-21T09:49:14Z

@arirkts369

You probably need to use tf.TensorShape([None, 3]) instead of tf.TensorShape([None, 1]) in the example above. In my application, I was regressing to a scalar but in your case you are classifying to one of three classes.

haifeng-jin added feature request pinned labels Feb 19, 2020

his0car mentioned this issue May 29, 2020

AutoKeras is super limited without fit_generator #1152

Closed

lsrock1 mentioned this issue Jun 2, 2020

Does ImageClassifier support tf.data ? #1158

Closed

telyn mentioned this issue Jun 23, 2020

ImageDataGenerator for autokeras #1198

Closed

haifeng-jin mentioned this issue Jun 24, 2020

Provide an example for using tfrecord with AutoKeras #1060

Closed

This was referenced Jun 25, 2020

[Feature request]fit_generator style training from Keras #540

Closed

hypermodel value error with ImageClassifier #1205

Closed

haifeng-jin added this to To Do in AutoKeras Management via automation Aug 15, 2020

haifeng-jin added this to the 1.0.7 milestone Aug 19, 2020

haifeng-jin modified the milestones: 1.0.7, 1.0.8, 1.0.9 Aug 21, 2020

haifeng-jin removed this from the 1.0.9 milestone Sep 28, 2020

haifeng-jin added this to the 1.0.10 milestone Sep 28, 2020

haifeng-jin mentioned this issue Oct 2, 2020

use my own generator,the autokeras always reading data,not start work. #1343

Closed

haifeng-jin changed the title ~~How to use generator in fit?~~ Support python generators Oct 10, 2020

haifeng-jin modified the milestones: 1.0.10, 1.0.11 Oct 10, 2020

haifeng-jin closed this as completed Nov 2, 2020

AutoKeras Management automation moved this from To Do to Done Nov 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support python generators #984

Support python generators #984

junyongyou commented Feb 18, 2020 •

edited

haifeng-jin commented Feb 19, 2020

ciberger commented May 5, 2020 •

edited

haifeng-jin commented May 16, 2020

ciberger commented May 17, 2020

haruiz commented May 17, 2020 •

edited

cibic89 commented May 29, 2020 •

edited

Pan-EDU commented Jun 2, 2020 •

edited

haifeng-jin commented Jun 24, 2020

cibic89 commented Jun 24, 2020

abdulsam commented Jul 15, 2020

cibic89 commented Jul 18, 2020

haifeng-jin commented Jul 21, 2020

haifeng-jin commented Jul 21, 2020

GuileCyclone commented Aug 20, 2020 •

edited

haifeng-jin commented Aug 20, 2020

MissingShoes commented Oct 8, 2020

haifeng-jin commented Nov 2, 2020

VictorReaver1999 commented Nov 18, 2020

cibic89 commented Nov 18, 2020

VictorReaver1999 commented Nov 19, 2020

theGOTOguy commented Nov 27, 2020 •

edited

AlinaYablokova commented Dec 4, 2020

arirkts369 commented Dec 21, 2020

theGOTOguy commented Dec 21, 2020

Support python generators #984

Support python generators #984

Comments

junyongyou commented Feb 18, 2020 • edited

haifeng-jin commented Feb 19, 2020

ciberger commented May 5, 2020 • edited

Description

Expected Behavior

Reproducible code

Error message

Setup Details

haifeng-jin commented May 16, 2020

ciberger commented May 17, 2020

haruiz commented May 17, 2020 • edited

cibic89 commented May 29, 2020 • edited

Pan-EDU commented Jun 2, 2020 • edited

haifeng-jin commented Jun 24, 2020

cibic89 commented Jun 24, 2020

abdulsam commented Jul 15, 2020

cibic89 commented Jul 18, 2020

haifeng-jin commented Jul 21, 2020

haifeng-jin commented Jul 21, 2020

GuileCyclone commented Aug 20, 2020 • edited

haifeng-jin commented Aug 20, 2020

MissingShoes commented Oct 8, 2020

haifeng-jin commented Nov 2, 2020

VictorReaver1999 commented Nov 18, 2020

cibic89 commented Nov 18, 2020

VictorReaver1999 commented Nov 19, 2020

theGOTOguy commented Nov 27, 2020 • edited

AlinaYablokova commented Dec 4, 2020

arirkts369 commented Dec 21, 2020

theGOTOguy commented Dec 21, 2020

junyongyou commented Feb 18, 2020 •

edited

ciberger commented May 5, 2020 •

edited

haruiz commented May 17, 2020 •

edited

cibic89 commented May 29, 2020 •

edited

Pan-EDU commented Jun 2, 2020 •

edited

GuileCyclone commented Aug 20, 2020 •

edited

theGOTOguy commented Nov 27, 2020 •

edited