Fixed: #1494 Check optimizer and learning rate settings #1736

jwliou · 2022-06-11T12:41:08Z

Which issue(s) does this Pull Request fix?

Partial resolves #1494

Details of the Pull Request

In fact, this fix is more likely a resolving for #1480. (with implicit insertion of rmsprop and adadelta optimizers with default settings.)

This pull request is for checking if there already exist hyperparameter settings for optimizer and learning rate in current hp.
Original code will just overwrite any possible optimizer and learning rate settings already in hp so that every model needs to search from a preset optimizer and learning rate space even if you set these in elsewhere such as tuners/task_specific.py.

It could be better to serialize optimizer settings, but where to put the serialization operations will be the next problem.

Check optimizer and learning rate settings before hard setting default settings for optimizer and learning rate.

google-cla · 2022-06-11T12:41:12Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Anselmoo · 2022-06-14T11:45:34Z

See also: #1734

jwliou · 2022-06-14T12:01:23Z

Optimizer will be a larger block of settings including learning rate, instead of considering optimizer and learning rate separately.

haifeng-jin

Thanks for the PR!
Please address the comments.
After that this PR should only contain the 2 added optimizers.
Please use sh shell/format.sh to format the code. Thanks.

haifeng-jin · 2022-06-20T02:48:37Z

autokeras/graph.py

+        if "optimizer" in hp.values.keys():
+            optimizer_name = hp.get("optimizer")
+        else:


We don't need the check here. If the optimizer is already defined in the hp. It will not override its search space, but directly use the old one even it is different from the new one.
You may give a try.

But if the hp comes with a certain default values for optimizer_name and learning_rate,
such values will be overwritten in the so-called "search space."

In autokeras/tuners/task_specific.py, there exists such setting:

IMAGE_CLASSIFIER = [ { "image_block_1/block_type": "vanilla", "image_block_1/normalize": True, # some irrelevant settings omitted for shorter reply. "classification_head_1/dropout": 0.5, "optimizer": "adam", "learning_rate": 1e-3, }, { "image_block_1/block_type": "resnet", "image_block_1/normalize": True, # some irrelevant settings omitted for shorter reply. "classification_head_1/dropout": 0, "optimizer": "adam", "learning_rate": 1e-3, }, { "image_block_1/block_type": "efficient", "image_block_1/normalize": True, # some irrelevant settings omitted for shorter reply. "classification_head_1/dropout": 0, "optimizer": "adam", "learning_rate": 2e-5, }, ]

what will happen if I had modified learning rate to something like 2e-4, which is not within
the default learning rate range [1e-1, 1e-2, 1e-3, 1e-4, 2e-5, 1e-5]?

modified code with checking will lead to...
learning_rate = hp.get("learning_rate") => get the learning_rate setting 2e-4.

but...
learning_rate = hp.Choice(
"learning_rate", [1e-1, 1e-2, 1e-3, 1e-4, 2e-5, 1e-5],
default=1e-3,
)
can never get learning_rate setting to 2e-4, that is the problem.

The same story will go for optimizer name, but since the only allowed optimizer names
are all listed before this pull request, previous statement might be worked in some sense,
but such implicit problem still exists.

So you intend to use this with ImageClassifier() API or similar? Would you please give a short code example to illustrate how the user would use this feature you contributed?

Thanks!

Tested example as following:

import numpy as np import matplotlib.pyplot as plt import tensorflow as tf from tensorflow.keras.datasets import cifar10 import autokeras as ak from autokeras.tuners import greedy # Custom setting of Image Classifier Tuner MY_IMAGE_CLASSIFIER = [ { "image_block_1/block_type": "resnet", "image_block_1/normalize": True, "image_block_1/augment": True, "image_block_1/image_augmentation_1/horizontal_flip": True, "image_block_1/image_augmentation_1/vertical_flip": True, "image_block_1/image_augmentation_1/contrast_factor": 0.0, "image_block_1/image_augmentation_1/rotation_factor": 0.0, "image_block_1/image_augmentation_1/translation_factor": 0.1, "image_block_1/image_augmentation_1/zoom_factor": 0.0, "image_block_1/res_net_block_1/pretrained": False, "image_block_1/res_net_block_1/version": "resnet50", "image_block_1/res_net_block_1/imagenet_size": True, "classification_head_1/spatial_reduction_1/reduction_type": "global_avg", "classification_head_1/dropout": 0, "optimizer": "adam", "learning_rate": 5e-4, }, { "image_block_1/block_type": "efficient", "image_block_1/normalize": True, "image_block_1/augment": True, "image_block_1/image_augmentation_1/horizontal_flip": True, "image_block_1/image_augmentation_1/vertical_flip": False, "image_block_1/image_augmentation_1/contrast_factor": 0.0, "image_block_1/image_augmentation_1/rotation_factor": 0.0, "image_block_1/image_augmentation_1/translation_factor": 0.1, "image_block_1/image_augmentation_1/zoom_factor": 0.0, "image_block_1/efficient_net_block_1/pretrained": True, "image_block_1/efficient_net_block_1/version": "b3", "image_block_1/efficient_net_block_1/trainable": True, "image_block_1/efficient_net_block_1/imagenet_size": True, "classification_head_1/spatial_reduction_1/reduction_type": "global_avg", "classification_head_1/dropout": 0, "optimizer": "rmsprop", "learning_rate": 1e-4, }, { "image_block_1/block_type": "efficient", "image_block_1/normalize": True, "image_block_1/augment": True, "image_block_1/image_augmentation_1/horizontal_flip": True, "image_block_1/image_augmentation_1/vertical_flip": False, "image_block_1/image_augmentation_1/contrast_factor": 0.0, "image_block_1/image_augmentation_1/rotation_factor": 0.0, "image_block_1/image_augmentation_1/translation_factor": 0.1, "image_block_1/image_augmentation_1/zoom_factor": 0.0, "image_block_1/efficient_net_block_1/pretrained": True, "image_block_1/efficient_net_block_1/version": "b4", "image_block_1/efficient_net_block_1/trainable": True, "image_block_1/efficient_net_block_1/imagenet_size": True, "classification_head_1/spatial_reduction_1/reduction_type": "global_avg", "classification_head_1/dropout": 0, "optimizer": "adam", "learning_rate": 5e-5, }, ] class MyImageClassifierTuner(greedy.Greedy): def __init__(self, **kwargs): super().__init__(initial_hps=MY_IMAGE_CLASSIFIER, **kwargs) # Loading cifar dataset (x_train, y_train), (x_test, y_test) = cifar10.load_data() print(x_train.shape) print(x_test.shape) # Apply custom tuner in ImageClassifier to select from custom models clf = ak.ImageClassifier(max_trials=3, tuner=MyImageClassifierTuner) # Training, testing and print fitting results. clf.fit(x_train, y_train, epochs=10, batch_size=16) predicted = clf.predict(x_test[:10]) labels = ('airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck') fig = plt.figure(figsize=(16, 6)) for i in range(10): ax = fig.add_subplot(2, 5, i + 1) ax.set_axis_off() plt.imshow(x_test[i]) ax.set_title(f'Predicted: {labels[int(predicted[i])]},\nReal: {labels[int(y_test[i])]}') plt.tight_layout() plt.show() model = clf.export_model() model.summary()

This is too complicated for other users. Let's just add the optimizer argument to AutoModel class.

OK... I think such simplist example will show difference of this pull.
But if the original behavior still considered as better... I have nothing to say and I'll just create another branch for private use.

Edited: After some test... find out that even if with this patch, optimizer overwritten may still exist...
then it should go even deeper. Let me see if hyperband has better luck...

import numpy as np import matplotlib.pyplot as plt import tensorflow as tf from tensorflow.keras.datasets import cifar10 import autokeras as ak # Loading cifar dataset (x_train, y_train), (x_test, y_test) = cifar10.load_data() print(x_train.shape) print(x_test.shape) # Change this 3 to 9, 12 or 27, the difference will be shown. nc = 9 # clf = ak.ImageClassifier(max_trials=3) clf = ak.ImageClassifier(max_trials=nc) # Training, testing and print fitting results. clf.fit(x_train, y_train, epochs=10, batch_size=16) clf.tuner.results_summary(num_trials=nc)

Status update: For the complex example, change the max_trials to 9 and do some modification to look at models selected by tuners, it becomes as following:
Some unexpected results become bold.

unexpected b2 is encountered since original complex setting did not include efficient net b2.

unexpected xception encountered. I did not even include xception in my complex settings.

It looks like if I really want to fix the behavior to accomplish my original expectation, it will be a long way.

Results summary Results in ./image_classifier Showing 9 best trials <keras_tuner.engine.objective.Objective object at 0x7f640736f550> Trial summary Hyperparameters: image_block_1/normalize: False image_block_1/augment: True image_block_1/block_type: efficient classification_head_1/spatial_reduction_1/reduction_type: global_avg classification_head_1/dropout: 0 optimizer: adam learning_rate: 5e-05 image_block_1/image_augmentation_1/translation_factor: 0.1 image_block_1/image_augmentation_1/horizontal_flip: True image_block_1/image_augmentation_1/vertical_flip: False image_block_1/image_augmentation_1/rotation_factor: 0.0 image_block_1/image_augmentation_1/zoom_factor: 0.0 image_block_1/image_augmentation_1/contrast_factor: 0.0 image_block_1/efficient_net_block_1/pretrained: True image_block_1/efficient_net_block_1/trainable: True image_block_1/efficient_net_block_1/version: b4 image_block_1/efficient_net_block_1/imagenet_size: True Score: 0.088919498026371 Trial summary Hyperparameters: image_block_1/block_type: efficient image_block_1/normalize: True image_block_1/augment: True image_block_1/image_augmentation_1/horizontal_flip: True image_block_1/image_augmentation_1/vertical_flip: False image_block_1/image_augmentation_1/contrast_factor: 0.0 image_block_1/image_augmentation_1/rotation_factor: 0.0 image_block_1/image_augmentation_1/translation_factor: 0.1 image_block_1/image_augmentation_1/zoom_factor: 0.0 image_block_1/efficient_net_block_1/pretrained: True image_block_1/efficient_net_block_1/version: b4 image_block_1/efficient_net_block_1/trainable: True image_block_1/efficient_net_block_1/imagenet_size: True classification_head_1/spatial_reduction_1/reduction_type: global_avg classification_head_1/dropout: 0 optimizer: adam learning_rate: 5e-05 Score: 0.10178251564502716 Trial summary Hyperparameters: image_block_1/normalize: True image_block_1/augment: True image_block_1/block_type: efficient classification_head_1/spatial_reduction_1/reduction_type: global_avg classification_head_1/dropout: 0 optimizer: adam learning_rate: 5e-05 image_block_1/image_augmentation_1/translation_factor: 0.1 image_block_1/image_augmentation_1/horizontal_flip: True image_block_1/image_augmentation_1/vertical_flip: False image_block_1/image_augmentation_1/rotation_factor: 0.0 image_block_1/image_augmentation_1/zoom_factor: 0.0 image_block_1/image_augmentation_1/contrast_factor: 0.0 image_block_1/efficient_net_block_1/pretrained: True image_block_1/efficient_net_block_1/trainable: True image_block_1/efficient_net_block_1/version: **b2** image_block_1/efficient_net_block_1/imagenet_size: True Score: 0.10402590036392212 Trial summary Hyperparameters: image_block_1/normalize: True image_block_1/augment: True image_block_1/block_type: efficient classification_head_1/spatial_reduction_1/reduction_type: global_avg classification_head_1/dropout: 0 optimizer: adam learning_rate: 5e-05 image_block_1/image_augmentation_1/translation_factor: 0.1 image_block_1/image_augmentation_1/horizontal_flip: True image_block_1/image_augmentation_1/vertical_flip: False image_block_1/image_augmentation_1/rotation_factor: 0.0 image_block_1/image_augmentation_1/zoom_factor: 0.0 image_block_1/image_augmentation_1/contrast_factor: 0.0 image_block_1/efficient_net_block_1/pretrained: True image_block_1/efficient_net_block_1/trainable: True image_block_1/efficient_net_block_1/version: b4 image_block_1/efficient_net_block_1/imagenet_size: True Score: 0.10415516793727875 Trial summary Hyperparameters: image_block_1/normalize: False image_block_1/augment: False image_block_1/block_type: efficient classification_head_1/spatial_reduction_1/reduction_type: global_avg classification_head_1/dropout: 0 optimizer: adam learning_rate: 5e-05 image_block_1/efficient_net_block_1/pretrained: True image_block_1/efficient_net_block_1/trainable: True image_block_1/efficient_net_block_1/version: b4 image_block_1/efficient_net_block_1/imagenet_size: True Score: 0.10568884015083313 Trial summary Hyperparameters: image_block_1/block_type: efficient image_block_1/normalize: True image_block_1/augment: True image_block_1/image_augmentation_1/horizontal_flip: True image_block_1/image_augmentation_1/vertical_flip: False image_block_1/image_augmentation_1/contrast_factor: 0.0 image_block_1/image_augmentation_1/rotation_factor: 0.0 image_block_1/image_augmentation_1/translation_factor: 0.1 image_block_1/image_augmentation_1/zoom_factor: 0.0 image_block_1/efficient_net_block_1/pretrained: True image_block_1/efficient_net_block_1/version: b3 image_block_1/efficient_net_block_1/trainable: True image_block_1/efficient_net_block_1/imagenet_size: True classification_head_1/spatial_reduction_1/reduction_type: global_avg classification_head_1/dropout: 0 optimizer: rmsprop learning_rate: 0.0001 Score: 0.18116365373134613 Trial summary Hyperparameters: image_block_1/normalize: True image_block_1/augment: True image_block_1/block_type: efficient classification_head_1/spatial_reduction_1/reduction_type: global_avg classification_head_1/dropout: 0 optimizer: adam learning_rate: 5e-05 image_block_1/image_augmentation_1/translation_factor: 0.1 image_block_1/image_augmentation_1/horizontal_flip: True image_block_1/image_augmentation_1/vertical_flip: False image_block_1/image_augmentation_1/rotation_factor: 0.0 image_block_1/image_augmentation_1/zoom_factor: 0.0 image_block_1/image_augmentation_1/contrast_factor: 0.1 image_block_1/efficient_net_block_1/pretrained: True image_block_1/efficient_net_block_1/trainable: True image_block_1/efficient_net_block_1/version: b4 image_block_1/efficient_net_block_1/imagenet_size: True Score: 0.19588683545589447 Trial summary Hyperparameters: image_block_1/normalize: False image_block_1/augment: True image_block_1/block_type: **xception** classification_head_1/spatial_reduction_1/reduction_type: global_avg classification_head_1/dropout: 0 optimizer: adam learning_rate: 5e-05 image_block_1/image_augmentation_1/translation_factor: 0.1 image_block_1/image_augmentation_1/horizontal_flip: True image_block_1/image_augmentation_1/vertical_flip: False image_block_1/image_augmentation_1/rotation_factor: 0.0 image_block_1/image_augmentation_1/zoom_factor: 0.0 image_block_1/image_augmentation_1/contrast_factor: 0.0 image_block_1/xception_block_1/pretrained: False image_block_1/xception_block_1/imagenet_size: False Score: 0.7607452273368835 Trial summary Hyperparameters: image_block_1/block_type: resnet image_block_1/normalize: True image_block_1/augment: True image_block_1/image_augmentation_1/horizontal_flip: True image_block_1/image_augmentation_1/vertical_flip: True image_block_1/image_augmentation_1/contrast_factor: 0.0 image_block_1/image_augmentation_1/rotation_factor: 0.0 image_block_1/image_augmentation_1/translation_factor: 0.1 image_block_1/image_augmentation_1/zoom_factor: 0.0 image_block_1/res_net_block_1/pretrained: False image_block_1/res_net_block_1/version: resnet50 image_block_1/res_net_block_1/imagenet_size: True classification_head_1/spatial_reduction_1/reduction_type: global_avg classification_head_1/dropout: 0 optimizer: adam learning_rate: 0.0005 Score: 0.7997666597366333

haifeng-jin · 2022-06-20T02:48:53Z

autokeras/graph.py

+        if "learning_rate" in hp.values.keys():
+            learning_rate = hp.get("learning_rate")
+        else:


Same here, we don't need the check.

Wrapped long lines to fulfil the coding style requirement.

Anselmoo · 2022-06-20T18:15:26Z

autokeras/graph.py

+        elif optimizer_name == "rmsprop":
+            optimizer = keras.optimizers.RMSprop(learning_rate=learning_rate)
+        elif optimizer_name == "adadelta":
+            optimizer = keras.optimizers.Adadelta(learning_rate=learning_rate)


Is there any specific reason why only these two solvers are added?

https://keras.io/api/optimizers/

If adding Adadelta, why not adding Adagrad, too?

In fact, this is not my major purpose to add optimizers of adam series. It may just because I had seen adadelta in the comments of codes.

For RMSprop, that is mainly for personal preference. Besides variations of adam optimizers, optimizers implemented in Keras are SGD, RMSprop, and Ftrl according to official document. I'm wondering if the last one may be preferred for some other programmer.

For more general request of custom optimizer, it still need to refer to my previous comment:
Serialization of optimizer settings.
But such serialization will treat learning rate as one parameter within optimizer settings,
and you will need to implement your own optimizer as subclass of Keras optimizer.
Further more, learning rate scheduler might also be considered.

A possible reference or example for serialization is loss or metric, which implement serialization in engine/head.py.
Optimizer may need to follow such step in long term, but it may not be suitable to implement in the same place as loss and metric since it is reasonable to assume one model/graph can only have one universal optimizer, but one model/graph may have multiple heads. Maybe I am wrong here.

and optimizer serialization may not only put extra codes in utils/type.py like this simple way:

# leave loss for example from tensorflow.keras.losses import Loss LossType = Union[str, Callable, Loss] # just share the same type definition with loss from tensorflow.keras.optimizers import Optimizer OptimizerType = Union[str, Callable, Optimizer]

The true purpose of this request may need more effort to accomplish though.

Hey!
Would you be willing to work on adding functionality for passing loss and optimizer in the form of tunable hyper parameters as arguments for automodel?

well, it is possible if overwriting optimizer can be avoided, but something weird happened for real code testing above.

Check optimizer and learning rate settings

7275648

Check optimizer and learning rate settings before hard setting default settings for optimizer and learning rate.

jwliou marked this pull request as ready for review June 17, 2022 10:42

haifeng-jin reviewed Jun 20, 2022

View reviewed changes

Correct coding style

94980c8

Wrapped long lines to fulfil the coding style requirement.

Anselmoo reviewed Jun 20, 2022

View reviewed changes

haifeng-jin closed this Dec 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed: #1494 Check optimizer and learning rate settings #1736

Fixed: #1494 Check optimizer and learning rate settings #1736

jwliou commented Jun 11, 2022 •

edited

Loading

google-cla bot commented Jun 11, 2022

Anselmoo commented Jun 14, 2022

jwliou commented Jun 14, 2022

haifeng-jin left a comment

haifeng-jin Jun 20, 2022

jwliou Jun 20, 2022

haifeng-jin Jun 22, 2022

jwliou Jun 23, 2022 •

edited by haifeng-jin

Loading

haifeng-jin Jun 27, 2022

jwliou Jun 27, 2022 •

edited

Loading

jwliou Jul 4, 2022

haifeng-jin Jun 20, 2022

Anselmoo Jun 20, 2022

jwliou Jun 20, 2022

djokester Jul 2, 2022

jwliou Jul 2, 2022

Fixed: #1494 Check optimizer and learning rate settings #1736

Fixed: #1494 Check optimizer and learning rate settings #1736

Conversation

jwliou commented Jun 11, 2022 • edited Loading

Which issue(s) does this Pull Request fix?

Details of the Pull Request

google-cla bot commented Jun 11, 2022

Anselmoo commented Jun 14, 2022

jwliou commented Jun 14, 2022

haifeng-jin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jwliou Jun 23, 2022 • edited by haifeng-jin Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jwliou Jun 27, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jwliou commented Jun 11, 2022 •

edited

Loading

jwliou Jun 23, 2022 •

edited by haifeng-jin

Loading

jwliou Jun 27, 2022 •

edited

Loading