Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support python generators #984

Closed
junyongyou opened this issue Feb 18, 2020 · 24 comments
Closed

Support python generators #984

junyongyou opened this issue Feb 18, 2020 · 24 comments

Comments

@junyongyou
Copy link

junyongyou commented Feb 18, 2020

I have a simple task to find the best CNN architecture for image regression. However, I have a large dataset, which cannot be loaded into memory at one time. It seems in the current release ImageRegressor only supports fit method requiring all the data (x and y) loaded in memory. How can I use generator in Autokeras? I have checked a closed issue #204, but it seems it was not solved.

I have already tried the tf.dataset by converting my generator to tf.dataset, but it didn't work. For example,

    dataset = tf.data.Dataset.from_generator(generate_batch, (tf.float32, tf.float32))
    vq_predictor = ak.ImageRegressor()
    for i, (X, y) in enumerate(dataset):
        X_dataset = tf.data.Dataset.from_tensors(X)
        y_dataset = tf.data.Dataset.from_tensors(y)
        vq_predictor.fit(X_dataset, y_dataset, validation_split=0.2)

Then I got error:

File "C:\Users\junyong\AppData\Local\Continuum\anaconda3\envs\tensorflow2\lib\site-packages\autokeras\tasks\image.py", line 222, in fit
**kwargs)
File "C:\Users\junyong\AppData\Local\Continuum\anaconda3\envs\tensorflow2\lib\site-packages\autokeras\auto_model.py", line 231, in fit
validation_split=validation_split)
File "C:\Users\junyong\AppData\Local\Continuum\anaconda3\envs\tensorflow2\lib\site-packages\autokeras\auto_model.py", line 313, in _prepare_data
dataset, validation_data = utils.split_dataset(dataset, validation_split)
File "C:\Users\junyong\AppData\Local\Continuum\anaconda3\envs\tensorflow2\lib\site-packages\autokeras\utils.py", line 69, in split_dataset
raise ValueError('The dataset should at least contain 2 '
ValueError: The dataset should at least contain 2 instances to be split.

Any suggestions are highly appreciated.

@haifeng-jin
Copy link
Member

Currently, we have not tested with generators yet.
I will look into this.
It seems it detected the dataset as less than 2 instances inside.
I assume it has more.

You may try providing the validation_data yourself.
It won't have this issue but may have some other errors.

@ciberger
Copy link

ciberger commented May 5, 2020

Description

Hi! I'm dealing with the same situation where I want to use a generator to read from a massive training set. If generators are not tested, can you recommend an alternative approach?

Thanks in advance! 🙌

Expected Behavior

Expected to train the model using as a training set a tensorflow.Dataset.generator object.

Alternatively, convert tensorflow.Dataset.generator object to train/test set input for Autokeras.

Reproducible code

import autokeras as ak
import tensorflow as tf
import pathlib

# Define general parameters
BATCH_SIZE = 32
IMG_HEIGHT = 150
IMG_WIDTH = 150

data_dir = tf.keras.utils.get_file(
    'flower_photos',
    'https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',
    untar=True)

data_dir = pathlib.Path(data_dir)

CLASS_NAMES = np.array([item.name for item in data_dir.glob('*') if item.name != "LICENSE.txt"])
# array(['roses', 'sunflowers', 'daisy', 'dandelion', 'tulips'], dtype='<U10')


# ImageDataGenerator using Keras
train_gen = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255)

# Define the parameters
train_data_gen = train_gen.flow_from_directory(
    batch_size=BATCH_SIZE,
    directory=data_dir,
    shuffle=True,
    target_size=(IMG_HEIGHT, IMG_WIDTH),
    class_mode='categorical'
)

# Wrapping Keras generator
train_ds = tf.data.Dataset.from_generator(
    lambda: train_data_gen,
    output_types=(tf.float32, tf.float32),
    output_shapes = ([BATCH_SIZE,IMG_HEIGHT,IMG_WIDTH,3],
                     [BATCH_SIZE,len(CLASS_NAMES)]))
# Found 3670 images belonging to 5 classes.

images, labels = next(train_data_gen)
print(images.dtype, images.shape)
print(labels.dtype, labels.shape)
# float32(32, 150, 150, 3)
# float32(32, 5)

# Initialize the image classifier.
clf = ak.ImageClassifier(max_trials=2)
# Feed the image classifier with training data.
clf.fit(train_ds, epochs=2)

Error message

See console output (Click to expand)
---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
 in 
      2 clf = ak.ImageClassifier(max_trials=2)
      3 # Feed the image classifier with training data.
----> 4 clf.fit(train_ds, epochs=2)

~/.pyenv/versions/python-3.7.4/lib/python3.7/site-packages/autokeras/tasks/image.py in fit(self, x, y, epochs, callbacks, validation_split, validation_data, **kwargs)
    120                     validation_split=validation_split,
    121                     validation_data=validation_data,
--> 122                     **kwargs)
    123 
    124 

~/.pyenv/versions/python-3.7.4/lib/python3.7/site-packages/autokeras/auto_model.py in fit(self, x, y, batch_size, epochs, callbacks, validation_split, validation_data, **kwargs)
    237             y=y,
    238             validation_data=validation_data,
--> 239             validation_split=validation_split)
    240 
    241         # Process the args.

~/.pyenv/versions/python-3.7.4/lib/python3.7/site-packages/autokeras/auto_model.py in _prepare_data(self, x, y, validation_data, validation_split)
    319         if validation_data is None and validation_split:
    320             self._split_dataset = True
--> 321             dataset, validation_data = utils.split_dataset(dataset, validation_split)
    322         return dataset, validation_data
    323 

~/.pyenv/versions/python-3.7.4/lib/python3.7/site-packages/autokeras/utils.py in split_dataset(dataset, validation_split)
     79         A tuple of two tf.data.Dataset. The training set and the validation set.
     80     """
---> 81     num_instances = dataset.reduce(np.int64(0), lambda x, _: x + 1).numpy()
     82     if num_instances < 2:
     83         raise ValueError('The dataset should at least contain 2 '

~/.pyenv/versions/python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/data/ops/dataset_ops.py in reduce(self, initial_state, reduce_func)
   1932             f=reduce_func,
   1933             output_shapes=structure.get_flat_tensor_shapes(state_structure),
-> 1934             output_types=structure.get_flat_tensor_types(state_structure)))
   1935 
   1936   def unbatch(self):

~/.pyenv/versions/python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_dataset_ops.py in reduce_dataset(input_dataset, initial_state, other_arguments, f, output_types, output_shapes, use_inter_op_parallelism, name)
   4659         pass  # Add nodes to the TensorFlow graph.
   4660     except _core._NotOkStatusException as e:
-> 4661       _ops.raise_from_not_ok_status(e, name)
   4662   # Add nodes to the TensorFlow graph.
   4663   if not isinstance(output_types, (list, tuple)):

~/.pyenv/versions/python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py in raise_from_not_ok_status(e, name)
   6604   message = e.message + (" name: " + name if name is not None else "")
   6605   # pylint: disable=protected-access
-> 6606   six.raise_from(core._status_to_exception(e.code, message), None)
   6607   # pylint: enable=protected-access
   6608 

~/.pyenv/versions/python-3.7.4/lib/python3.7/site-packages/six.py in raise_from(value, from_value)

InvalidArgumentError: ValueError: `generator` yielded an element of shape (22, 150, 150, 3) where an element of shape (32, 150, 150, 3) was expected.
Traceback (most recent call last):

  File "/Users/cristobalberger/.pyenv/versions/python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/script_ops.py", line 236, in __call__
    ret = func(*args)

  File "/Users/cristobalberger/.pyenv/versions/python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 825, in generator_py_func
    "of shape %s was expected." % (ret_array.shape, expected_shape))

ValueError: `generator` yielded an element of shape (22, 150, 150, 3) where an element of shape (32, 150, 150, 3) was expected.


	 [[{{node PyFunc}}]] [Op:ReduceDataset]

Setup Details

  • OS type and version: macOS Catalina (10.15)
  • Python: 3.7.4
  • autokeras: 1.0.2
  • keras-tuner: 1.0.1
  • scikit-learn: 0.22.2.post1
  • numpy: 1.18.3
  • pandas: 1.0.3
  • tensorflow: 2.1.0

@haifeng-jin
Copy link
Member

Current we don't support generators. It seems not the main stream way to do the job in the future of tf keras. We will try to support tf records, which we haven't really tested yet.

@ciberger
Copy link

Hi @haifeng-jin. Thanks for your reply.

are you aware if it's possible to convert tf.data.Dataset.from_generator to a ndarray to be used as a training set?

Thanks

@haruiz
Copy link

haruiz commented May 17, 2020

@ciberger, this code seems to work, you can give a shot:

import tensorflow as tf
import numpy as np
import autokeras as ak
from tensorflow.keras.preprocessing import image
import pathlib
import matplotlib.pylab as plt

BATCH_SIZE = 32
IMG_HEIGHT = 224
IMG_WIDTH = 224
data_dir = "dataset/train"
data_dir = pathlib.Path(data_dir)
#image_count = len(list(data_dir.glob('*/*.jpg')))
#STEPS_PER_EPOCH = np.ceil(image_count/BATCH_SIZE)

def preprocess(img):
    img = image.array_to_img(img, scale=False)
    img = img.resize((IMG_WIDTH, IMG_HEIGHT))
    img = image.img_to_array(img)
    return img / 255.0

image_generator = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255,
                                              horizontal_flip=True,
                                              validation_split=0.2,
                                              preprocessing_function=preprocess)

train_generator = image_generator.flow_from_directory(
    directory=str(data_dir),
     batch_size=BATCH_SIZE,
     shuffle=True,
     #class_mode="categorical",
     target_size=(IMG_HEIGHT, IMG_WIDTH),
    subset='training'
)

val_generator = image_generator.flow_from_directory(
    directory=str(data_dir),
     batch_size=BATCH_SIZE,
     shuffle=True,
     #class_mode="categorical",
     target_size=(IMG_HEIGHT, IMG_WIDTH),
    subset='validation'
)

def callable_iterator(generator):
    for img_batch, targets_batch in generator:
        yield img_batch, targets_batch

train_dataset = tf.data.Dataset.from_generator(lambda: callable_iterator(train_generator),output_types=(tf.float32, tf.float32))
val_dataset = tf.data.Dataset.from_generator(lambda: callable_iterator(val_generator),output_types=(tf.float32, tf.float32))

# for image, label in train_dataset.take(1):
#   print("Image shape: ", image.numpy().shape)
#   print("Label: ", label.numpy())
#   plt.imshow(image.numpy()[0] * 255)
#   plt.show()


clf = ak.ImageClassifier(max_trials=10)
#Feed the tensorflow Dataset to the classifier.
clf.fit(train_dataset, epochs=60)
#Evaluate the best model.
print(clf.evaluate(val_dataset))

Then you can feed your fit function with the tf.datasets

@cibic89
Copy link

cibic89 commented May 29, 2020

@ciberger, this code seems to work, you can give a shot:

import tensorflow as tf
import numpy as np
import autokeras as ak
from tensorflow.keras.preprocessing import image
import pathlib
import matplotlib.pylab as plt

BATCH_SIZE = 32
IMG_HEIGHT = 224
IMG_WIDTH = 224
data_dir = tf.keras.utils.get_file(
  'flower_photos',
  'https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',
  untar=True)
data_dir = pathlib.Path(data_dir)
#image_count = len(list(data_dir.glob('*/*.jpg')))
#STEPS_PER_EPOCH = np.ceil(image_count/BATCH_SIZE)

def preprocess(img):
    img = image.array_to_img(img, scale=False)
    img = img.resize((IMG_WIDTH, IMG_HEIGHT))
    img = image.img_to_array(img)
    return img / 255.0

image_generator = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255,
                                              horizontal_flip=True,
                                              validation_split=0.2,
                                              preprocessing_function=preprocess)

train_generator = image_generator.flow_from_directory(
    directory=str(data_dir),
     batch_size=BATCH_SIZE,
     shuffle=True,
     #class_mode="categorical",
     target_size=(IMG_HEIGHT, IMG_WIDTH),
    subset='training'
)

val_generator = image_generator.flow_from_directory(
    directory=str(data_dir),
     batch_size=BATCH_SIZE,
     shuffle=True,
     #class_mode="categorical",
     target_size=(IMG_HEIGHT, IMG_WIDTH),
    subset='validation'
)

def callable_iterator(generator):
    for img_batch, targets_batch in generator:
        yield img_batch, targets_batch

train_dataset = tf.data.Dataset.from_generator(lambda: callable_iterator(train_generator),output_types=(tf.float32, tf.float32))
val_dataset = tf.data.Dataset.from_generator(lambda: callable_iterator(val_generator),output_types=(tf.float32, tf.float32))

for image, label in train_dataset.take(1):
  print("Image shape: ", image.numpy().shape)
  print("Label: ", label.numpy()[0])
  plt.imshow(image.numpy()[0].squeeze(axis=2) * 255)
  plt.show()

clf = ak.ImageClassifier(max_trials=10)
#Feed the tensorflow Dataset to the classifier.
clf.fit(train_dataset, epochs=60)
#Evaluate the best model.
print(clf.evaluate(val_dataset))

Then you can feed your fit function with the tf.datasets

Your example shows no output for me. I'm using google colab with GPU support and I have got it working before (albeit with bad predictive performance).

UPDATE: I know the generator works when I test it using the adapted code above
UPDATE: Tried on local machine and the same issue happens, TensorFlow 2.2.0, AutoKeras 1.0.2
UPDATE: The problem is with AutoKeras #1075

@Pan-EDU
Copy link

Pan-EDU commented Jun 2, 2020

Hi @haruiz & @cibic89,

I downloaded a little mnist images stored in local folder to try your method, but it just showed the following message and then system hang:

Found 9 images belonging to 3 classes

when system hang, I could see it use a lot of CPU computing resources:

CMD %MEM %CPU
python3 script.py 0.7 117

Could you tell me how to deal with this situation?
I used tensorflow 2.1.0 and current version of autokeras.
Thank you!

@haifeng-jin
Copy link
Member

Currently, we are short of hands. It would be great if anyone could try to give an example on TF record and tf.data. @naitslup has provided this example (#1060 (comment)) , which would be a good start.

@cibic89
Copy link

cibic89 commented Jun 24, 2020

Currently, we are short of hands. It would be great if anyone could try to give an example on TF record and tf.data. @naitslup has provided this example (#1060 (comment)) , which would be a good start.

Happy to do this.

@abdulsam
Copy link

@ciberger, this code seems to work, you can give a shot:

import tensorflow as tf
import numpy as np
import autokeras as ak
from tensorflow.keras.preprocessing import image
import pathlib
import matplotlib.pylab as plt

BATCH_SIZE = 32
IMG_HEIGHT = 224
IMG_WIDTH = 224
data_dir = "dataset/train"
data_dir = pathlib.Path(data_dir)
#image_count = len(list(data_dir.glob('*/*.jpg')))
#STEPS_PER_EPOCH = np.ceil(image_count/BATCH_SIZE)

def preprocess(img):
    img = image.array_to_img(img, scale=False)
    img = img.resize((IMG_WIDTH, IMG_HEIGHT))
    img = image.img_to_array(img)
    return img / 255.0

image_generator = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255,
                                              horizontal_flip=True,
                                              validation_split=0.2,
                                              preprocessing_function=preprocess)

train_generator = image_generator.flow_from_directory(
    directory=str(data_dir),
     batch_size=BATCH_SIZE,
     shuffle=True,
     #class_mode="categorical",
     target_size=(IMG_HEIGHT, IMG_WIDTH),
    subset='training'
)

val_generator = image_generator.flow_from_directory(
    directory=str(data_dir),
     batch_size=BATCH_SIZE,
     shuffle=True,
     #class_mode="categorical",
     target_size=(IMG_HEIGHT, IMG_WIDTH),
    subset='validation'
)

def callable_iterator(generator):
    for img_batch, targets_batch in generator:
        yield img_batch, targets_batch

train_dataset = tf.data.Dataset.from_generator(lambda: callable_iterator(train_generator),output_types=(tf.float32, tf.float32))
val_dataset = tf.data.Dataset.from_generator(lambda: callable_iterator(val_generator),output_types=(tf.float32, tf.float32))

# for image, label in train_dataset.take(1):
#   print("Image shape: ", image.numpy().shape)
#   print("Label: ", label.numpy())
#   plt.imshow(image.numpy()[0] * 255)
#   plt.show()


clf = ak.ImageClassifier(max_trials=10)
#Feed the tensorflow Dataset to the classifier.
clf.fit(train_dataset, epochs=60)
#Evaluate the best model.
print(clf.evaluate(val_dataset))

Then you can feed your fit function with the tf.datasets

Getting an error ValueError: Cannot take the length of shape with unknown rank. on line clf.fit(train_dataset, epochs=60)

@cibic89
Copy link

cibic89 commented Jul 18, 2020

I just discovered this one: https://www.tensorflow.org/api_docs/python/tf/data/Dataset#from_generator
You might be able to wrap your generator into a tf.data.Dataset with it.

@haifeng-jin Can I contribute a tutorial for using an image generator with AutoKeras?

@haifeng-jin
Copy link
Member

@cibic89 Yes. Thank you! Please put it in this folder https://github.com/keras-team/autokeras/tree/master/docs/py
You may also try to run your code in colab to make sure it works.

@haifeng-jin
Copy link
Member

@abdulsam Hi Abdus, it seems you have encountered a lot of issues. Can we setup a call to help you resolve the issues? You can also help us answer some of our user study questions and provide your feedback. You can reach me on slack. Thank you!

@haifeng-jin haifeng-jin added this to To Do in AutoKeras Management via automation Aug 15, 2020
@haifeng-jin haifeng-jin added this to the 1.0.7 milestone Aug 19, 2020
@GuileCyclone
Copy link

GuileCyclone commented Aug 20, 2020

@haifeng-jin May tensorflow.keras.utils.Sequence be another way to support? It's just like torch.Dataset class, and may be easy to apply in code.I used Sequence for huge numpy array which is stored in a h5py file, it just be read to RAM when getitem is called.

@haifeng-jin
Copy link
Member

@GuileCyclone Thank you! I will look that up.

@haifeng-jin haifeng-jin modified the milestones: 1.0.7, 1.0.8, 1.0.9 Aug 21, 2020
@haifeng-jin haifeng-jin removed this from the 1.0.9 milestone Sep 28, 2020
@MissingShoes
Copy link

I just documented a problem with tf.Dataset and the StructuredDataRegressor in #766

@haifeng-jin haifeng-jin changed the title How to use generator in fit? Support python generators Oct 10, 2020
@haifeng-jin haifeng-jin modified the milestones: 1.0.10, 1.0.11 Oct 10, 2020
@haifeng-jin
Copy link
Member

Python generators are supported in 1.0.10 release.
Please refer to the tutorials here.
https://autokeras.com/tutorial/load/

Please let us know if it doesn't work for your case.

AutoKeras Management automation moved this from To Do to Done Nov 2, 2020
@VictorReaver1999
Copy link

@ciberger, this code seems to work, you can give a shot:

import tensorflow as tf
import numpy as np
import autokeras as ak
from tensorflow.keras.preprocessing import image
import pathlib
import matplotlib.pylab as plt

BATCH_SIZE = 32
IMG_HEIGHT = 224
IMG_WIDTH = 224
data_dir = tf.keras.utils.get_file(
  'flower_photos',
  'https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',
  untar=True)
data_dir = pathlib.Path(data_dir)
#image_count = len(list(data_dir.glob('*/*.jpg')))
#STEPS_PER_EPOCH = np.ceil(image_count/BATCH_SIZE)

def preprocess(img):
    img = image.array_to_img(img, scale=False)
    img = img.resize((IMG_WIDTH, IMG_HEIGHT))
    img = image.img_to_array(img)
    return img / 255.0

image_generator = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255,
                                              horizontal_flip=True,
                                              validation_split=0.2,
                                              preprocessing_function=preprocess)

train_generator = image_generator.flow_from_directory(
    directory=str(data_dir),
     batch_size=BATCH_SIZE,
     shuffle=True,
     #class_mode="categorical",
     target_size=(IMG_HEIGHT, IMG_WIDTH),
    subset='training'
)

val_generator = image_generator.flow_from_directory(
    directory=str(data_dir),
     batch_size=BATCH_SIZE,
     shuffle=True,
     #class_mode="categorical",
     target_size=(IMG_HEIGHT, IMG_WIDTH),
    subset='validation'
)

def callable_iterator(generator):
    for img_batch, targets_batch in generator:
        yield img_batch, targets_batch

train_dataset = tf.data.Dataset.from_generator(lambda: callable_iterator(train_generator),output_types=(tf.float32, tf.float32))
val_dataset = tf.data.Dataset.from_generator(lambda: callable_iterator(val_generator),output_types=(tf.float32, tf.float32))

for image, label in train_dataset.take(1):
  print("Image shape: ", image.numpy().shape)
  print("Label: ", label.numpy()[0])
  plt.imshow(image.numpy()[0].squeeze(axis=2) * 255)
  plt.show()

clf = ak.ImageClassifier(max_trials=10)
#Feed the tensorflow Dataset to the classifier.
clf.fit(train_dataset, epochs=60)
#Evaluate the best model.
print(clf.evaluate(val_dataset))

Then you can feed your fit function with the tf.datasets

Your example shows no output for me. I'm using google colab with GPU support and I have got it working before (albeit with bad predictive performance).

UPDATE: I know the generator works when I test it using the adapted code above
UPDATE: Tried on local machine and the same issue happens, TensorFlow 2.2.0, AutoKeras 1.0.2
UPDATE: The problem is with AutoKeras #1075

Hi. I know this is several months late, but I am not getting any output just as you. What does it have to do with autokeras? The issue is with matplotlib since this is the library responsible for displaying the image. What am I missing? Thank you.

@cibic89
Copy link

cibic89 commented Nov 18, 2020

@ciberger, this code seems to work, you can give a shot:

import tensorflow as tf
import numpy as np
import autokeras as ak
from tensorflow.keras.preprocessing import image
import pathlib
import matplotlib.pylab as plt

BATCH_SIZE = 32
IMG_HEIGHT = 224
IMG_WIDTH = 224
data_dir = tf.keras.utils.get_file(
  'flower_photos',
  'https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',
  untar=True)
data_dir = pathlib.Path(data_dir)
#image_count = len(list(data_dir.glob('*/*.jpg')))
#STEPS_PER_EPOCH = np.ceil(image_count/BATCH_SIZE)

def preprocess(img):
    img = image.array_to_img(img, scale=False)
    img = img.resize((IMG_WIDTH, IMG_HEIGHT))
    img = image.img_to_array(img)
    return img / 255.0

image_generator = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255,
                                              horizontal_flip=True,
                                              validation_split=0.2,
                                              preprocessing_function=preprocess)

train_generator = image_generator.flow_from_directory(
    directory=str(data_dir),
     batch_size=BATCH_SIZE,
     shuffle=True,
     #class_mode="categorical",
     target_size=(IMG_HEIGHT, IMG_WIDTH),
    subset='training'
)

val_generator = image_generator.flow_from_directory(
    directory=str(data_dir),
     batch_size=BATCH_SIZE,
     shuffle=True,
     #class_mode="categorical",
     target_size=(IMG_HEIGHT, IMG_WIDTH),
    subset='validation'
)

def callable_iterator(generator):
    for img_batch, targets_batch in generator:
        yield img_batch, targets_batch

train_dataset = tf.data.Dataset.from_generator(lambda: callable_iterator(train_generator),output_types=(tf.float32, tf.float32))
val_dataset = tf.data.Dataset.from_generator(lambda: callable_iterator(val_generator),output_types=(tf.float32, tf.float32))

for image, label in train_dataset.take(1):
  print("Image shape: ", image.numpy().shape)
  print("Label: ", label.numpy()[0])
  plt.imshow(image.numpy()[0].squeeze(axis=2) * 255)
  plt.show()

clf = ak.ImageClassifier(max_trials=10)
#Feed the tensorflow Dataset to the classifier.
clf.fit(train_dataset, epochs=60)
#Evaluate the best model.
print(clf.evaluate(val_dataset))

Then you can feed your fit function with the tf.datasets

Your example shows no output for me. I'm using google colab with GPU support and I have got it working before (albeit with bad predictive performance).
UPDATE: I know the generator works when I test it using the adapted code above
UPDATE: Tried on local machine and the same issue happens, TensorFlow 2.2.0, AutoKeras 1.0.2
UPDATE: The problem is with AutoKeras #1075

Hi. I know this is several months late, but I am not getting any output just as you. What does it have to do with autokeras? The issue is with matplotlib since this is the library responsible for displaying the image. What am I missing? Thank you.

An AutoKeras contributor said so here.

It could be what you said; it could also be something else.
Why does it matter anyway? It works now.

@VictorReaver1999
Copy link

@ciberger, this code seems to work, you can give a shot:

import tensorflow as tf
import numpy as np
import autokeras as ak
from tensorflow.keras.preprocessing import image
import pathlib
import matplotlib.pylab as plt

BATCH_SIZE = 32
IMG_HEIGHT = 224
IMG_WIDTH = 224
data_dir = tf.keras.utils.get_file(
  'flower_photos',
  'https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',
  untar=True)
data_dir = pathlib.Path(data_dir)
#image_count = len(list(data_dir.glob('*/*.jpg')))
#STEPS_PER_EPOCH = np.ceil(image_count/BATCH_SIZE)

def preprocess(img):
    img = image.array_to_img(img, scale=False)
    img = img.resize((IMG_WIDTH, IMG_HEIGHT))
    img = image.img_to_array(img)
    return img / 255.0

image_generator = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255,
                                              horizontal_flip=True,
                                              validation_split=0.2,
                                              preprocessing_function=preprocess)

train_generator = image_generator.flow_from_directory(
    directory=str(data_dir),
     batch_size=BATCH_SIZE,
     shuffle=True,
     #class_mode="categorical",
     target_size=(IMG_HEIGHT, IMG_WIDTH),
    subset='training'
)

val_generator = image_generator.flow_from_directory(
    directory=str(data_dir),
     batch_size=BATCH_SIZE,
     shuffle=True,
     #class_mode="categorical",
     target_size=(IMG_HEIGHT, IMG_WIDTH),
    subset='validation'
)

def callable_iterator(generator):
    for img_batch, targets_batch in generator:
        yield img_batch, targets_batch

train_dataset = tf.data.Dataset.from_generator(lambda: callable_iterator(train_generator),output_types=(tf.float32, tf.float32))
val_dataset = tf.data.Dataset.from_generator(lambda: callable_iterator(val_generator),output_types=(tf.float32, tf.float32))

for image, label in train_dataset.take(1):
  print("Image shape: ", image.numpy().shape)
  print("Label: ", label.numpy()[0])
  plt.imshow(image.numpy()[0].squeeze(axis=2) * 255)
  plt.show()

clf = ak.ImageClassifier(max_trials=10)
#Feed the tensorflow Dataset to the classifier.
clf.fit(train_dataset, epochs=60)
#Evaluate the best model.
print(clf.evaluate(val_dataset))

Then you can feed your fit function with the tf.datasets

Your example shows no output for me. I'm using google colab with GPU support and I have got it working before (albeit with bad predictive performance).
UPDATE: I know the generator works when I test it using the adapted code above
UPDATE: Tried on local machine and the same issue happens, TensorFlow 2.2.0, AutoKeras 1.0.2
UPDATE: The problem is with AutoKeras #1075

Hi. I know this is several months late, but I am not getting any output just as you. What does it have to do with autokeras? The issue is with matplotlib since this is the library responsible for displaying the image. What am I missing? Thank you.

An AutoKeras contributor said so here.

It could be what you said; it could also be something else.
Why does it matter anyway? It works now.

Well, I am getting output now. But clf.fit isn't working (it says ValueError: Cannot take the length of shape with unknown rank. on line clf.fit(train_dataset, epochs=60) which is the same issue that abdulsam faced. What's the workaround? The link you gave doesn't really explain much, it is a base-case tutorial.

@theGOTOguy
Copy link

theGOTOguy commented Nov 27, 2020

@VictorReaver1999 I was also getting the "Cannot take the length of shape with unknown rank" error. The issue is that autokeras.utils.data_utils.batched doesn't know what to do with the generator. Since the generators in the example are indeed outputting batches, just monkey-patching this function to return True will get you up and running for now.

train_dataset = tf.data.Dataset.from_generator(lambda: callable_iterator(train_generator),output_types=(tf.float32, tf.float32))
val_dataset = tf.data.Dataset.from_generator(lambda: callable_iterator(val_generator),output_types=(tf.float32, tf.float32))

# Autokeras data_utils gets confused by the generator.
# Just let it know that the data is indeed batched.
ak.utils.data_utils.batched = lambda _: True

clf = ak.ImageRegressor(
    max_trials=args.max_trials,
    directory=args.save_model_dir)

Update (above left for posterity):

I found that there was a place downstream that was also failing in spite of the monkey-patch. As I experimented with AutoKeras, I found a cleaner way that both clears the downstream error and avoids the ugly monkey-patching:

def callable_iterator(generator, expected_batch_size):
  for img_batch, targets_batch in generator:
    if img_batch.shape[0] == expected_batch_size:
      yield img_batch, targets_batch

train_dataset = tf.data.Dataset.from_generator(
    lambda: callable_iterator(train_generator, args.batch_size),
    output_types=(tf.float32, tf.float32),
    output_shapes=(tf.TensorShape([None, args.crop_size, args.crop_size, 1 if args.gray else 3]),
                   tf.TensorShape([None, 1])))
val_dataset = tf.data.Dataset.from_generator(
    lambda: callable_iterator(val_generator, args.batch_size),
    output_types=(tf.float32, tf.float32),
    output_shapes=(tf.TensorShape([None, args.crop_size, args.crop_size, 1 if args.gray else 3]),
                   tf.TensorShape([None, 1])))

Obviously the things starting with args. should be replaced with something appropriate to your code.

This surfaces the appropriate dimensions to the necessary places in AutoKeras. AutoKeras does not behave well if you give it any incomplete batches, so I had to modify callable_iterator as above to reject incomplete batches from the iterator.

@AlinaYablokova
Copy link

Hello @haifeng-jin!

I have a problem in stuck model.fit execution with generators.

Versions
TensorFlow: 2.3.0
AutoKeras: 1.0.12
KerasTuner: 1.0.3

Hardware
Google Colab:
RAM 25 GB
GPU Tesla V100-SXM2-16GB

But I would like to run it on my computer:
RAM 16 GB
GeForce GTX 1060 6GB

So generators are important for me also due to hardware restrictions.

Data amount
X_train.shape - (3466413, 18, 5)
X_train_steady.shape - (3466413, 1)
Y_lat_train.shape - (3466413, 10)
Y_lon_train.shape - (3466413, 10))
X_val.shape - (323931, 18, 5)
X_val_steady.shape - (323931, 1),
Y_lat_val.shape - (323931, 10)
Y_lon_val.shape - (323931, 10)

Code

Option 1

def make_gen_callable(x_data,
                      x_data_steady,
                      y_lat_data,
                      y_lon_data,
                      batch_size):
    def generator_shuffle():
        max_index = len(x_data) - 1
        while 1:
            rows = np.random.randint(0, max_index)
            x = (x_data[rows], x_data_steady[rows])
            y = (y_lat_data[rows], y_lon_data[rows])
            yield x, y
    return generator_shuffle

batch_size = 256

train_gen = make_gen_callable(X_train,
                              X_train_steady,
                              Y_lat_train,
                              Y_lon_train,
                              batch_size)

val_gen = make_gen_callable(X_val,
                             X_val_steady,
                             Y_lat_val,
                             Y_lon_val,
                             batch_size)

output_types = ((tf.float32, tf.float32), (tf.float32, tf.float32))

output_shapes = (([18, 5], [1]), ([10], [10]))

train_dataset = tf.data.Dataset.from_generator(
    train_gen,
    output_types=output_types,
    output_shapes=output_shapes,
).batch(batch_size).repeat()

val_dataset = tf.data.Dataset.from_generator(
    val_gen,
    output_types=output_types,
    output_shapes=output_shapes,
).batch(batch_size).repeat()

Option 2

def make_gen_callable(x_data,
                      x_data_steady,
                      y_lat_data,
                      y_lon_data,
                      batch_size):
    def generator_shuffle():
        max_index = len(x_data) - 1
        while 1:
            rows = np.random.randint(0, max_index)
            x = (x_data[rows], x_data_steady[rows])
            y = (y_lat_data[rows], y_lon_data[rows])
            yield x, y
    return generator_shuffle

batch_size = 256

train_gen = make_gen_callable(X_train,
                              X_train_steady,
                              Y_lat_train,
                              Y_lon_train,
                              batch_size)

val_gen = make_gen_callable(X_val,
                             X_val_steady,
                             Y_lat_val,
                             Y_lon_val,
                             batch_size)

output_types = ((tf.float32, tf.float32), (tf.float32, tf.float32))

output_shapes = (([None, 18, 5], [None, 1]), ([None, 10], [None, 10]))

train_dataset = tf.data.Dataset.from_generator(
    train_gen,
    output_types=output_types,
    output_shapes=output_shapes,
)

val_dataset = tf.data.Dataset.from_generator(
    val_gen,
    output_types=output_types,
    output_shapes=output_shapes,
)

In both cases I did not get the results even after 24 hours.

image

Without generators one can see the first results in output immediately.

Could you please help me with it?

@arirkts369
Copy link

@VictorReaver1999 I was also getting the "Cannot take the length of shape with unknown rank" error. The issue is that autokeras.utils.data_utils.batched doesn't know what to do with the generator. Since the generators in the example are indeed outputting batches, just monkey-patching this function to return True will get you up and running for now.

train_dataset = tf.data.Dataset.from_generator(lambda: callable_iterator(train_generator),output_types=(tf.float32, tf.float32))
val_dataset = tf.data.Dataset.from_generator(lambda: callable_iterator(val_generator),output_types=(tf.float32, tf.float32))

# Autokeras data_utils gets confused by the generator.
# Just let it know that the data is indeed batched.
ak.utils.data_utils.batched = lambda _: True

clf = ak.ImageRegressor(
    max_trials=args.max_trials,
    directory=args.save_model_dir)

Update (above left for posterity):

I found that there was a place downstream that was also failing in spite of the monkey-patch. As I experimented with AutoKeras, I found a cleaner way that both clears the downstream error and avoids the ugly monkey-patching:

def callable_iterator(generator, expected_batch_size):
  for img_batch, targets_batch in generator:
    if img_batch.shape[0] == expected_batch_size:
      yield img_batch, targets_batch

train_dataset = tf.data.Dataset.from_generator(
    lambda: callable_iterator(train_generator, args.batch_size),
    output_types=(tf.float32, tf.float32),
    output_shapes=(tf.TensorShape([None, args.crop_size, args.crop_size, 1 if args.gray else 3]),
                   tf.TensorShape([None, 1])))
val_dataset = tf.data.Dataset.from_generator(
    lambda: callable_iterator(val_generator, args.batch_size),
    output_types=(tf.float32, tf.float32),
    output_shapes=(tf.TensorShape([None, args.crop_size, args.crop_size, 1 if args.gray else 3]),
                   tf.TensorShape([None, 1])))

Obviously the things starting with args. should be replaced with something appropriate to your code.

This surfaces the appropriate dimensions to the necessary places in AutoKeras. AutoKeras does not behave well if you give it any incomplete batches, so I had to modify callable_iterator as above to reject incomplete batches from the iterator.

Hi I have 3 classes and 32 batches for some reason when I try to add this patch to the code it gives me an error

ValueError: generator yielded an element of shape (32, 3) where an element of shape (None, 1) was expected

also change the values (None,1) in the patch to (32,3), it runs but nothing happens at all.

@theGOTOguy
Copy link

@arirkts369

You probably need to use tf.TensorShape([None, 3]) instead of tf.TensorShape([None, 1]) in the example above. In my application, I was regressing to a scalar but in your case you are classifying to one of three classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests