Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed: #1494 Check optimizer and learning rate settings #1736

Closed
wants to merge 2 commits into from

Conversation

jwliou
Copy link

@jwliou jwliou commented Jun 11, 2022

Which issue(s) does this Pull Request fix?

Partial resolves #1494

Details of the Pull Request

In fact, this fix is more likely a resolving for #1480. (with implicit insertion of rmsprop and adadelta optimizers with default settings.)

This pull request is for checking if there already exist hyperparameter settings for optimizer and learning rate in current hp.
Original code will just overwrite any possible optimizer and learning rate settings already in hp so that every model needs to search from a preset optimizer and learning rate space even if you set these in elsewhere such as tuners/task_specific.py.

It could be better to serialize optimizer settings, but where to put the serialization operations will be the next problem.

Check optimizer and learning rate settings before hard setting default settings for optimizer and learning rate.
@google-cla
Copy link

google-cla bot commented Jun 11, 2022

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@Anselmoo
Copy link
Contributor

See also: #1734

@jwliou
Copy link
Author

jwliou commented Jun 14, 2022

Optimizer will be a larger block of settings including learning rate, instead of considering optimizer and learning rate separately.

@jwliou jwliou marked this pull request as ready for review June 17, 2022 10:42
Copy link
Collaborator

@haifeng-jin haifeng-jin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!
Please address the comments.
After that this PR should only contain the 2 added optimizers.
Please use sh shell/format.sh to format the code. Thanks.

Comment on lines +283 to +285
if "optimizer" in hp.values.keys():
optimizer_name = hp.get("optimizer")
else:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need the check here. If the optimizer is already defined in the hp. It will not override its search space, but directly use the old one even it is different from the new one.
You may give a try.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if the hp comes with a certain default values for optimizer_name and learning_rate,
such values will be overwritten in the so-called "search space."

In autokeras/tuners/task_specific.py, there exists such setting:

IMAGE_CLASSIFIER = [
    {
        "image_block_1/block_type": "vanilla",
        "image_block_1/normalize": True,
        # some irrelevant settings omitted for shorter reply.
        "classification_head_1/dropout": 0.5,
        "optimizer": "adam",
        "learning_rate": 1e-3,
    },
    {
        "image_block_1/block_type": "resnet",
        "image_block_1/normalize": True,
        # some irrelevant settings omitted for shorter reply.
        "classification_head_1/dropout": 0,
        "optimizer": "adam",
        "learning_rate": 1e-3,
    },
    {
        "image_block_1/block_type": "efficient",
        "image_block_1/normalize": True,
        # some irrelevant settings omitted for shorter reply.
        "classification_head_1/dropout": 0,
        "optimizer": "adam",
        "learning_rate": 2e-5,
    },
]

what will happen if I had modified learning rate to something like 2e-4, which is not within
the default learning rate range [1e-1, 1e-2, 1e-3, 1e-4, 2e-5, 1e-5]?

modified code with checking will lead to...
learning_rate = hp.get("learning_rate") => get the learning_rate setting 2e-4.

but...
learning_rate = hp.Choice(
"learning_rate", [1e-1, 1e-2, 1e-3, 1e-4, 2e-5, 1e-5],
default=1e-3,
)
can never get learning_rate setting to 2e-4, that is the problem.

The same story will go for optimizer name, but since the only allowed optimizer names
are all listed before this pull request, previous statement might be worked in some sense,
but such implicit problem still exists.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you intend to use this with ImageClassifier() API or similar? Would you please give a short code example to illustrate how the user would use this feature you contributed?

Thanks!

Copy link
Author

@jwliou jwliou Jun 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested example as following:

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.datasets import cifar10
import autokeras as ak
from autokeras.tuners import greedy

# Custom setting of Image Classifier Tuner
MY_IMAGE_CLASSIFIER = [
    {
        "image_block_1/block_type": "resnet",
        "image_block_1/normalize": True,
        "image_block_1/augment": True,
        "image_block_1/image_augmentation_1/horizontal_flip": True,
        "image_block_1/image_augmentation_1/vertical_flip": True,
        "image_block_1/image_augmentation_1/contrast_factor": 0.0,
        "image_block_1/image_augmentation_1/rotation_factor": 0.0,
        "image_block_1/image_augmentation_1/translation_factor": 0.1,
        "image_block_1/image_augmentation_1/zoom_factor": 0.0,
        "image_block_1/res_net_block_1/pretrained": False,
        "image_block_1/res_net_block_1/version": "resnet50",
        "image_block_1/res_net_block_1/imagenet_size": True,
        "classification_head_1/spatial_reduction_1/reduction_type": "global_avg",
        "classification_head_1/dropout": 0,
        "optimizer": "adam",
        "learning_rate": 5e-4,
    },
    {
        "image_block_1/block_type": "efficient",
        "image_block_1/normalize": True,
        "image_block_1/augment": True,
        "image_block_1/image_augmentation_1/horizontal_flip": True,
        "image_block_1/image_augmentation_1/vertical_flip": False,
        "image_block_1/image_augmentation_1/contrast_factor": 0.0,
        "image_block_1/image_augmentation_1/rotation_factor": 0.0,
        "image_block_1/image_augmentation_1/translation_factor": 0.1,
        "image_block_1/image_augmentation_1/zoom_factor": 0.0,
        "image_block_1/efficient_net_block_1/pretrained": True,
        "image_block_1/efficient_net_block_1/version": "b3",
        "image_block_1/efficient_net_block_1/trainable": True,
        "image_block_1/efficient_net_block_1/imagenet_size": True,
        "classification_head_1/spatial_reduction_1/reduction_type": "global_avg",
        "classification_head_1/dropout": 0,
        "optimizer": "rmsprop",
        "learning_rate": 1e-4,
    },
    {
        "image_block_1/block_type": "efficient",
        "image_block_1/normalize": True,
        "image_block_1/augment": True,
        "image_block_1/image_augmentation_1/horizontal_flip": True,
        "image_block_1/image_augmentation_1/vertical_flip": False,
        "image_block_1/image_augmentation_1/contrast_factor": 0.0,
        "image_block_1/image_augmentation_1/rotation_factor": 0.0,
        "image_block_1/image_augmentation_1/translation_factor": 0.1,
        "image_block_1/image_augmentation_1/zoom_factor": 0.0,
        "image_block_1/efficient_net_block_1/pretrained": True,
        "image_block_1/efficient_net_block_1/version": "b4",
        "image_block_1/efficient_net_block_1/trainable": True,
        "image_block_1/efficient_net_block_1/imagenet_size": True,
        "classification_head_1/spatial_reduction_1/reduction_type": "global_avg",
        "classification_head_1/dropout": 0,
        "optimizer": "adam",
        "learning_rate": 5e-5,
    },
]

class MyImageClassifierTuner(greedy.Greedy):
    def __init__(self, **kwargs):
        super().__init__(initial_hps=MY_IMAGE_CLASSIFIER, **kwargs)

# Loading cifar dataset
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

print(x_train.shape)
print(x_test.shape)

# Apply custom tuner in ImageClassifier to select from custom models
clf = ak.ImageClassifier(max_trials=3, tuner=MyImageClassifierTuner)

# Training, testing and print fitting results.
clf.fit(x_train, y_train, epochs=10, batch_size=16)
predicted = clf.predict(x_test[:10])

labels = ('airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

fig = plt.figure(figsize=(16, 6))
for i in range(10):
    ax = fig.add_subplot(2, 5, i + 1)
    ax.set_axis_off()
    plt.imshow(x_test[i])
    ax.set_title(f'Predicted: {labels[int(predicted[i])]},\nReal: {labels[int(y_test[i])]}')

    plt.tight_layout()
plt.show()

model = clf.export_model()
model.summary()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too complicated for other users. Let's just add the optimizer argument to AutoModel class.

Copy link
Author

@jwliou jwliou Jun 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK... I think such simplist example will show difference of this pull.
But if the original behavior still considered as better... I have nothing to say and I'll just create another branch for private use.

Edited: After some test... find out that even if with this patch, optimizer overwritten may still exist...
then it should go even deeper. Let me see if hyperband has better luck...

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.datasets import cifar10
import autokeras as ak

# Loading cifar dataset
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

print(x_train.shape)
print(x_test.shape)

# Change this 3 to 9, 12 or 27, the difference will be shown.
nc = 9
# clf = ak.ImageClassifier(max_trials=3)
clf = ak.ImageClassifier(max_trials=nc)

# Training, testing and print fitting results.
clf.fit(x_train, y_train, epochs=10, batch_size=16)
clf.tuner.results_summary(num_trials=nc)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Status update: For the complex example, change the max_trials to 9 and do some modification to look at models selected by tuners, it becomes as following:
Some unexpected results become bold.

  1. unexpected b2 is encountered since original complex setting did not include efficient net b2.
  2. unexpected xception encountered. I did not even include xception in my complex settings.

It looks like if I really want to fix the behavior to accomplish my original expectation, it will be a long way.

Results summary
Results in ./image_classifier
Showing 9 best trials
<keras_tuner.engine.objective.Objective object at 0x7f640736f550>
Trial summary
Hyperparameters:
image_block_1/normalize: False
image_block_1/augment: True
image_block_1/block_type: efficient
classification_head_1/spatial_reduction_1/reduction_type: global_avg
classification_head_1/dropout: 0
optimizer: adam
learning_rate: 5e-05
image_block_1/image_augmentation_1/translation_factor: 0.1
image_block_1/image_augmentation_1/horizontal_flip: True
image_block_1/image_augmentation_1/vertical_flip: False
image_block_1/image_augmentation_1/rotation_factor: 0.0
image_block_1/image_augmentation_1/zoom_factor: 0.0
image_block_1/image_augmentation_1/contrast_factor: 0.0
image_block_1/efficient_net_block_1/pretrained: True
image_block_1/efficient_net_block_1/trainable: True
image_block_1/efficient_net_block_1/version: b4
image_block_1/efficient_net_block_1/imagenet_size: True
Score: 0.088919498026371
Trial summary
Hyperparameters:
image_block_1/block_type: efficient
image_block_1/normalize: True
image_block_1/augment: True
image_block_1/image_augmentation_1/horizontal_flip: True
image_block_1/image_augmentation_1/vertical_flip: False
image_block_1/image_augmentation_1/contrast_factor: 0.0
image_block_1/image_augmentation_1/rotation_factor: 0.0
image_block_1/image_augmentation_1/translation_factor: 0.1
image_block_1/image_augmentation_1/zoom_factor: 0.0
image_block_1/efficient_net_block_1/pretrained: True
image_block_1/efficient_net_block_1/version: b4
image_block_1/efficient_net_block_1/trainable: True
image_block_1/efficient_net_block_1/imagenet_size: True
classification_head_1/spatial_reduction_1/reduction_type: global_avg
classification_head_1/dropout: 0
optimizer: adam
learning_rate: 5e-05
Score: 0.10178251564502716
Trial summary
Hyperparameters:
image_block_1/normalize: True
image_block_1/augment: True
image_block_1/block_type: efficient
classification_head_1/spatial_reduction_1/reduction_type: global_avg
classification_head_1/dropout: 0
optimizer: adam
learning_rate: 5e-05
image_block_1/image_augmentation_1/translation_factor: 0.1
image_block_1/image_augmentation_1/horizontal_flip: True
image_block_1/image_augmentation_1/vertical_flip: False
image_block_1/image_augmentation_1/rotation_factor: 0.0
image_block_1/image_augmentation_1/zoom_factor: 0.0
image_block_1/image_augmentation_1/contrast_factor: 0.0
image_block_1/efficient_net_block_1/pretrained: True
image_block_1/efficient_net_block_1/trainable: True
image_block_1/efficient_net_block_1/version: **b2**
image_block_1/efficient_net_block_1/imagenet_size: True
Score: 0.10402590036392212
Trial summary
Hyperparameters:
image_block_1/normalize: True
image_block_1/augment: True
image_block_1/block_type: efficient
classification_head_1/spatial_reduction_1/reduction_type: global_avg
classification_head_1/dropout: 0
optimizer: adam
learning_rate: 5e-05
image_block_1/image_augmentation_1/translation_factor: 0.1
image_block_1/image_augmentation_1/horizontal_flip: True
image_block_1/image_augmentation_1/vertical_flip: False
image_block_1/image_augmentation_1/rotation_factor: 0.0
image_block_1/image_augmentation_1/zoom_factor: 0.0
image_block_1/image_augmentation_1/contrast_factor: 0.0
image_block_1/efficient_net_block_1/pretrained: True
image_block_1/efficient_net_block_1/trainable: True
image_block_1/efficient_net_block_1/version: b4
image_block_1/efficient_net_block_1/imagenet_size: True
Score: 0.10415516793727875
Trial summary
Hyperparameters:
image_block_1/normalize: False
image_block_1/augment: False
image_block_1/block_type: efficient
classification_head_1/spatial_reduction_1/reduction_type: global_avg
classification_head_1/dropout: 0
optimizer: adam
learning_rate: 5e-05
image_block_1/efficient_net_block_1/pretrained: True
image_block_1/efficient_net_block_1/trainable: True
image_block_1/efficient_net_block_1/version: b4
image_block_1/efficient_net_block_1/imagenet_size: True
Score: 0.10568884015083313
Trial summary
Hyperparameters:
image_block_1/block_type: efficient
image_block_1/normalize: True
image_block_1/augment: True
image_block_1/image_augmentation_1/horizontal_flip: True
image_block_1/image_augmentation_1/vertical_flip: False
image_block_1/image_augmentation_1/contrast_factor: 0.0
image_block_1/image_augmentation_1/rotation_factor: 0.0
image_block_1/image_augmentation_1/translation_factor: 0.1
image_block_1/image_augmentation_1/zoom_factor: 0.0
image_block_1/efficient_net_block_1/pretrained: True
image_block_1/efficient_net_block_1/version: b3
image_block_1/efficient_net_block_1/trainable: True
image_block_1/efficient_net_block_1/imagenet_size: True
classification_head_1/spatial_reduction_1/reduction_type: global_avg
classification_head_1/dropout: 0
optimizer: rmsprop
learning_rate: 0.0001
Score: 0.18116365373134613
Trial summary
Hyperparameters:
image_block_1/normalize: True
image_block_1/augment: True
image_block_1/block_type: efficient
classification_head_1/spatial_reduction_1/reduction_type: global_avg
classification_head_1/dropout: 0
optimizer: adam
learning_rate: 5e-05
image_block_1/image_augmentation_1/translation_factor: 0.1
image_block_1/image_augmentation_1/horizontal_flip: True
image_block_1/image_augmentation_1/vertical_flip: False
image_block_1/image_augmentation_1/rotation_factor: 0.0
image_block_1/image_augmentation_1/zoom_factor: 0.0
image_block_1/image_augmentation_1/contrast_factor: 0.1
image_block_1/efficient_net_block_1/pretrained: True
image_block_1/efficient_net_block_1/trainable: True
image_block_1/efficient_net_block_1/version: b4
image_block_1/efficient_net_block_1/imagenet_size: True
Score: 0.19588683545589447
Trial summary
Hyperparameters:
image_block_1/normalize: False
image_block_1/augment: True
image_block_1/block_type: **xception**
classification_head_1/spatial_reduction_1/reduction_type: global_avg
classification_head_1/dropout: 0
optimizer: adam
learning_rate: 5e-05
image_block_1/image_augmentation_1/translation_factor: 0.1
image_block_1/image_augmentation_1/horizontal_flip: True
image_block_1/image_augmentation_1/vertical_flip: False
image_block_1/image_augmentation_1/rotation_factor: 0.0
image_block_1/image_augmentation_1/zoom_factor: 0.0
image_block_1/image_augmentation_1/contrast_factor: 0.0
image_block_1/xception_block_1/pretrained: False
image_block_1/xception_block_1/imagenet_size: False
Score: 0.7607452273368835
Trial summary
Hyperparameters:
image_block_1/block_type: resnet
image_block_1/normalize: True
image_block_1/augment: True
image_block_1/image_augmentation_1/horizontal_flip: True
image_block_1/image_augmentation_1/vertical_flip: True
image_block_1/image_augmentation_1/contrast_factor: 0.0
image_block_1/image_augmentation_1/rotation_factor: 0.0
image_block_1/image_augmentation_1/translation_factor: 0.1
image_block_1/image_augmentation_1/zoom_factor: 0.0
image_block_1/res_net_block_1/pretrained: False
image_block_1/res_net_block_1/version: resnet50
image_block_1/res_net_block_1/imagenet_size: True
classification_head_1/spatial_reduction_1/reduction_type: global_avg
classification_head_1/dropout: 0
optimizer: adam
learning_rate: 0.0005
Score: 0.7997666597366333

Comment on lines +294 to +296
if "learning_rate" in hp.values.keys():
learning_rate = hp.get("learning_rate")
else:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, we don't need the check.

Wrapped long lines to fulfil the coding style requirement.
Comment on lines +306 to +309
elif optimizer_name == "rmsprop":
optimizer = keras.optimizers.RMSprop(learning_rate=learning_rate)
elif optimizer_name == "adadelta":
optimizer = keras.optimizers.Adadelta(learning_rate=learning_rate)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any specific reason why only these two solvers are added?

https://keras.io/api/optimizers/

If adding Adadelta, why not adding Adagrad, too?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, this is not my major purpose to add optimizers of adam series. It may just because I had seen adadelta in the comments of codes.

For RMSprop, that is mainly for personal preference. Besides variations of adam optimizers, optimizers implemented in Keras are SGD, RMSprop, and Ftrl according to official document. I'm wondering if the last one may be preferred for some other programmer.

For more general request of custom optimizer, it still need to refer to my previous comment:
Serialization of optimizer settings.
But such serialization will treat learning rate as one parameter within optimizer settings,
and you will need to implement your own optimizer as subclass of Keras optimizer.
Further more, learning rate scheduler might also be considered.

A possible reference or example for serialization is loss or metric, which implement serialization in engine/head.py.
Optimizer may need to follow such step in long term, but it may not be suitable to implement in the same place as loss and metric since it is reasonable to assume one model/graph can only have one universal optimizer, but one model/graph may have multiple heads. Maybe I am wrong here.

and optimizer serialization may not only put extra codes in utils/type.py like this simple way:

# leave loss for example
from tensorflow.keras.losses import Loss

LossType = Union[str, Callable, Loss]

# just share the same type definition with loss
from tensorflow.keras.optimizers import Optimizer

OptimizerType = Union[str, Callable, Optimizer]

The true purpose of this request may need more effort to accomplish though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey!
Would you be willing to work on adding functionality for passing loss and optimizer in the form of tunable hyper parameters as arguments for automodel?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, it is possible if overwriting optimizer can be avoided, but something weird happened for real code testing above.

@haifeng-jin haifeng-jin closed this Dec 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add compile function to automodel
4 participants