New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support python generators #984
Comments
Currently, we have not tested with generators yet. You may try providing the validation_data yourself. |
DescriptionHi! I'm dealing with the same situation where I want to use a generator to read from a massive training set. If generators are not tested, can you recommend an alternative approach? Thanks in advance! 🙌 Expected BehaviorExpected to train the model using as a training set a tensorflow.Dataset.generator object. Alternatively, convert tensorflow.Dataset.generator object to train/test set input for Autokeras. Reproducible codeimport autokeras as ak
import tensorflow as tf
import pathlib
# Define general parameters
BATCH_SIZE = 32
IMG_HEIGHT = 150
IMG_WIDTH = 150
data_dir = tf.keras.utils.get_file(
'flower_photos',
'https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',
untar=True)
data_dir = pathlib.Path(data_dir)
CLASS_NAMES = np.array([item.name for item in data_dir.glob('*') if item.name != "LICENSE.txt"])
# array(['roses', 'sunflowers', 'daisy', 'dandelion', 'tulips'], dtype='<U10')
# ImageDataGenerator using Keras
train_gen = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255)
# Define the parameters
train_data_gen = train_gen.flow_from_directory(
batch_size=BATCH_SIZE,
directory=data_dir,
shuffle=True,
target_size=(IMG_HEIGHT, IMG_WIDTH),
class_mode='categorical'
)
# Wrapping Keras generator
train_ds = tf.data.Dataset.from_generator(
lambda: train_data_gen,
output_types=(tf.float32, tf.float32),
output_shapes = ([BATCH_SIZE,IMG_HEIGHT,IMG_WIDTH,3],
[BATCH_SIZE,len(CLASS_NAMES)]))
# Found 3670 images belonging to 5 classes.
images, labels = next(train_data_gen)
print(images.dtype, images.shape)
print(labels.dtype, labels.shape)
# float32(32, 150, 150, 3)
# float32(32, 5)
# Initialize the image classifier.
clf = ak.ImageClassifier(max_trials=2)
# Feed the image classifier with training data.
clf.fit(train_ds, epochs=2) Error messageSee console output (Click to expand)---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
in
2 clf = ak.ImageClassifier(max_trials=2)
3 # Feed the image classifier with training data.
----> 4 clf.fit(train_ds, epochs=2)
~/.pyenv/versions/python-3.7.4/lib/python3.7/site-packages/autokeras/tasks/image.py in fit(self, x, y, epochs, callbacks, validation_split, validation_data, **kwargs)
120 validation_split=validation_split,
121 validation_data=validation_data,
--> 122 **kwargs)
123
124
~/.pyenv/versions/python-3.7.4/lib/python3.7/site-packages/autokeras/auto_model.py in fit(self, x, y, batch_size, epochs, callbacks, validation_split, validation_data, **kwargs)
237 y=y,
238 validation_data=validation_data,
--> 239 validation_split=validation_split)
240
241 # Process the args.
~/.pyenv/versions/python-3.7.4/lib/python3.7/site-packages/autokeras/auto_model.py in _prepare_data(self, x, y, validation_data, validation_split)
319 if validation_data is None and validation_split:
320 self._split_dataset = True
--> 321 dataset, validation_data = utils.split_dataset(dataset, validation_split)
322 return dataset, validation_data
323
~/.pyenv/versions/python-3.7.4/lib/python3.7/site-packages/autokeras/utils.py in split_dataset(dataset, validation_split)
79 A tuple of two tf.data.Dataset. The training set and the validation set.
80 """
---> 81 num_instances = dataset.reduce(np.int64(0), lambda x, _: x + 1).numpy()
82 if num_instances < 2:
83 raise ValueError('The dataset should at least contain 2 '
~/.pyenv/versions/python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/data/ops/dataset_ops.py in reduce(self, initial_state, reduce_func)
1932 f=reduce_func,
1933 output_shapes=structure.get_flat_tensor_shapes(state_structure),
-> 1934 output_types=structure.get_flat_tensor_types(state_structure)))
1935
1936 def unbatch(self):
~/.pyenv/versions/python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_dataset_ops.py in reduce_dataset(input_dataset, initial_state, other_arguments, f, output_types, output_shapes, use_inter_op_parallelism, name)
4659 pass # Add nodes to the TensorFlow graph.
4660 except _core._NotOkStatusException as e:
-> 4661 _ops.raise_from_not_ok_status(e, name)
4662 # Add nodes to the TensorFlow graph.
4663 if not isinstance(output_types, (list, tuple)):
~/.pyenv/versions/python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py in raise_from_not_ok_status(e, name)
6604 message = e.message + (" name: " + name if name is not None else "")
6605 # pylint: disable=protected-access
-> 6606 six.raise_from(core._status_to_exception(e.code, message), None)
6607 # pylint: enable=protected-access
6608
~/.pyenv/versions/python-3.7.4/lib/python3.7/site-packages/six.py in raise_from(value, from_value)
InvalidArgumentError: ValueError: `generator` yielded an element of shape (22, 150, 150, 3) where an element of shape (32, 150, 150, 3) was expected.
Traceback (most recent call last):
File "/Users/cristobalberger/.pyenv/versions/python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/ops/script_ops.py", line 236, in __call__
ret = func(*args)
File "/Users/cristobalberger/.pyenv/versions/python-3.7.4/lib/python3.7/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 825, in generator_py_func
"of shape %s was expected." % (ret_array.shape, expected_shape))
ValueError: `generator` yielded an element of shape (22, 150, 150, 3) where an element of shape (32, 150, 150, 3) was expected.
[[{{node PyFunc}}]] [Op:ReduceDataset] Setup Details
|
Current we don't support generators. It seems not the main stream way to do the job in the future of tf keras. We will try to support tf records, which we haven't really tested yet. |
Hi @haifeng-jin. Thanks for your reply. are you aware if it's possible to convert Thanks |
@ciberger, this code seems to work, you can give a shot: import tensorflow as tf
import numpy as np
import autokeras as ak
from tensorflow.keras.preprocessing import image
import pathlib
import matplotlib.pylab as plt
BATCH_SIZE = 32
IMG_HEIGHT = 224
IMG_WIDTH = 224
data_dir = "dataset/train"
data_dir = pathlib.Path(data_dir)
#image_count = len(list(data_dir.glob('*/*.jpg')))
#STEPS_PER_EPOCH = np.ceil(image_count/BATCH_SIZE)
def preprocess(img):
img = image.array_to_img(img, scale=False)
img = img.resize((IMG_WIDTH, IMG_HEIGHT))
img = image.img_to_array(img)
return img / 255.0
image_generator = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255,
horizontal_flip=True,
validation_split=0.2,
preprocessing_function=preprocess)
train_generator = image_generator.flow_from_directory(
directory=str(data_dir),
batch_size=BATCH_SIZE,
shuffle=True,
#class_mode="categorical",
target_size=(IMG_HEIGHT, IMG_WIDTH),
subset='training'
)
val_generator = image_generator.flow_from_directory(
directory=str(data_dir),
batch_size=BATCH_SIZE,
shuffle=True,
#class_mode="categorical",
target_size=(IMG_HEIGHT, IMG_WIDTH),
subset='validation'
)
def callable_iterator(generator):
for img_batch, targets_batch in generator:
yield img_batch, targets_batch
train_dataset = tf.data.Dataset.from_generator(lambda: callable_iterator(train_generator),output_types=(tf.float32, tf.float32))
val_dataset = tf.data.Dataset.from_generator(lambda: callable_iterator(val_generator),output_types=(tf.float32, tf.float32))
# for image, label in train_dataset.take(1):
# print("Image shape: ", image.numpy().shape)
# print("Label: ", label.numpy())
# plt.imshow(image.numpy()[0] * 255)
# plt.show()
clf = ak.ImageClassifier(max_trials=10)
#Feed the tensorflow Dataset to the classifier.
clf.fit(train_dataset, epochs=60)
#Evaluate the best model.
print(clf.evaluate(val_dataset)) Then you can feed your fit function with the tf.datasets |
Your example shows no output for me. I'm using google colab with GPU support and I have got it working before (albeit with bad predictive performance). UPDATE: I know the generator works when I test it using the adapted code above |
I downloaded a little mnist images stored in local folder to try your method, but it just showed the following message and then system hang:
when system hang, I could see it use a lot of CPU computing resources: CMD %MEM %CPU Could you tell me how to deal with this situation? |
Currently, we are short of hands. It would be great if anyone could try to give an example on TF record and tf.data. @naitslup has provided this example (#1060 (comment)) , which would be a good start. |
Happy to do this. |
Getting an error |
@haifeng-jin Can I contribute a tutorial for using an image generator with AutoKeras? |
@cibic89 Yes. Thank you! Please put it in this folder https://github.com/keras-team/autokeras/tree/master/docs/py |
@abdulsam Hi Abdus, it seems you have encountered a lot of issues. Can we setup a call to help you resolve the issues? You can also help us answer some of our user study questions and provide your feedback. You can reach me on slack. Thank you! |
@haifeng-jin May tensorflow.keras.utils.Sequence be another way to support? It's just like torch.Dataset class, and may be easy to apply in code.I used Sequence for huge numpy array which is stored in a h5py file, it just be read to RAM when getitem is called. |
@GuileCyclone Thank you! I will look that up. |
I just documented a problem with |
Python generators are supported in 1.0.10 release. Please let us know if it doesn't work for your case. |
Hi. I know this is several months late, but I am not getting any output just as you. What does it have to do with autokeras? The issue is with matplotlib since this is the library responsible for displaying the image. What am I missing? Thank you. |
An AutoKeras contributor said so here. It could be what you said; it could also be something else. |
Well, I am getting output now. But clf.fit isn't working (it says ValueError: Cannot take the length of shape with unknown rank. on line clf.fit(train_dataset, epochs=60) which is the same issue that abdulsam faced. What's the workaround? The link you gave doesn't really explain much, it is a base-case tutorial. |
@VictorReaver1999 I was also getting the "Cannot take the length of shape with unknown rank" error. The issue is that autokeras.utils.data_utils.batched doesn't know what to do with the generator. Since the generators in the example are indeed outputting batches, just monkey-patching this function to return True will get you up and running for now.
Update (above left for posterity): I found that there was a place downstream that was also failing in spite of the monkey-patch. As I experimented with AutoKeras, I found a cleaner way that both clears the downstream error and avoids the ugly monkey-patching:
Obviously the things starting with This surfaces the appropriate dimensions to the necessary places in AutoKeras. AutoKeras does not behave well if you give it any incomplete batches, so I had to modify callable_iterator as above to reject incomplete batches from the iterator. |
Hello @haifeng-jin! I have a problem in stuck Versions Hardware But I would like to run it on my computer: So generators are important for me also due to hardware restrictions. Data amount Code Option 1
Option 2
In both cases I did not get the results even after 24 hours. Without generators one can see the first results in output immediately. Could you please help me with it? |
Hi I have 3 classes and 32 batches for some reason when I try to add this patch to the code it gives me an error ValueError: also change the values (None,1) in the patch to (32,3), it runs but nothing happens at all. |
You probably need to use tf.TensorShape([None, 3]) instead of tf.TensorShape([None, 1]) in the example above. In my application, I was regressing to a scalar but in your case you are classifying to one of three classes. |
I have a simple task to find the best CNN architecture for image regression. However, I have a large dataset, which cannot be loaded into memory at one time. It seems in the current release ImageRegressor only supports fit method requiring all the data (x and y) loaded in memory. How can I use generator in Autokeras? I have checked a closed issue #204, but it seems it was not solved.
I have already tried the tf.dataset by converting my generator to tf.dataset, but it didn't work. For example,
Then I got error:
Any suggestions are highly appreciated.
The text was updated successfully, but these errors were encountered: