Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError when calling model.fit() with AdamW optimizer on Apple Silicon #176

Closed
anton-bogomazov opened this issue Jun 13, 2023 · 12 comments

Comments

@anton-bogomazov
Copy link

System information.

  • Have I written custom code (as opposed to using a stock example script provided in Keras): yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS Ventura Version 13.4 (22F66), Apple M1 Pro
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): 2.14.0-dev20230612
  • Python version: 3.11.1
  • Bazel version (if compiling from source): n/a
  • GPU model and memory: n/a
  • Exact command to reproduce: check below

Describe the problem.
Calling model.fit() created with AdamW optimizer leads to an AttributeError. Keras trying to fallback optimizers to their legacy versions on Apple Silicon, but the legacy version of AdamW does not exist.
https://github.com/keras-team/keras/blob/5849a0953a644bd6af51b672b32a235510d4f43d/keras/optimizers/__init__.py#LL300C1-L315C59

Same issue description: https://developer.apple.com/forums/thread/731019

Describe the current behavior.
AttributeError: 'str' object has no attribute 'minimize' while fitting model with AdamW optimizer.

Describe the expected behavior.
Check if legacy version exists for the optimizer; don't fallback if not and use the standard version + print warning.

Contributing.

  • Do you want to contribute a PR? (yes/no): yes
  • If yes, please read this page for instructions
  • Briefly describe your candidate solution(if contributing): check if legacy version exists for the optimizer; don't fallback if not and use the standard version + print warning.

Standalone code to reproduce the issue.

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from keras.optimizers.experimental import AdamW
import torch.optim as optim
print(f'Tensorflow version: {tf.__version__}')

# Create and compile a linear model
model = Sequential()
model.add(Dense(1, input_dim=1, activation='linear'))
model.compile(optimizer=AdamW(learning_rate=0.001, weight_decay=0.001),
              loss='mean_squared_error')
# Generate some dummy data
X_train = tf.random.uniform(shape=(100, 1), minval=-1, maxval=1)
y_train = 2 * X_train + 1
# Fit and predict
model.fit(X_train, y_train, epochs=10, batch_size=4)
X_test = tf.random.uniform(shape=(10, 1), minval=-1, maxval=1)
y_pred = model.predict(X_test)

Source code / logs.

Tensorflow version: 2.14.0-dev20230612
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.AdamW` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.AdamW`.
WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.AdamW`.
Epoch 1/10
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
File ~/adamw_test.py:22
     19 y_train = 2 * X_train + 1
     21 # Train the model
---> 22 model.fit(X_train, y_train, epochs=10, batch_size=4)
     24 # Generate predictions
     25 X_test = tf.random.uniform(shape=(10, 1), minval=-1, maxval=1)

File ~/.pyenv/versions/3.11.1/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py:70, in filter_traceback.<locals>.error_handler(*args, **kwargs)
     67     filtered_tb = _process_traceback_frames(e.__traceback__)
     68     # To get the full stack trace, call:
     69     # `tf.debugging.disable_traceback_filtering()`
---> 70     raise e.with_traceback(filtered_tb) from None
     71 finally:
     72     del filtered_tb

File /var/folders/98/w0f3wmg54750lvm2swvb7q700000gn/T/__autograph_generated_filenjw6calw.py:15, in outer_factory.<locals>.inner_factory.<locals>.tf__train_function(iterator)
     13 try:
     14     do_return = True
---> 15     retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
     16 except:
     17     do_return = False

AttributeError: in user code:

    File "/Users/user/.pyenv/versions/3.11.1/lib/python3.11/site-packages/keras/src/engine/training.py", line 1338, in train_function  *
        return step_function(self, iterator)
    File "/Users/user/.pyenv/versions/3.11.1/lib/python3.11/site-packages/keras/src/engine/training.py", line 1322, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/Users/user/.pyenv/versions/3.11.1/lib/python3.11/site-packages/keras/src/engine/training.py", line 1303, in run_step  **
        outputs = model.train_step(data)
    File "/Users/user/.pyenv/versions/3.11.1/lib/python3.11/site-packages/keras/src/engine/training.py", line 1084, in train_step
        self.optimizer.minimize(loss, self.trainable_variables, tape=tape)

    AttributeError: 'str' object has no attribute 'minimize'
@tilakrayal
Copy link
Collaborator

@anton-bogomazov,
I tried to execute the mentioned above code on both tensorflow v2.12 and tf-nighty(2.14.0-dev20230622), and it was executed without any issue/error. Also instead of using from keras.optimizers.experimental import AdamW, please try to use from tensorflow.keras.optimizers import AdamW. Kindly find the gist of it here. Thank you!

@onuralpszr
Copy link

@tilakrayal I was also following this issue and just to be clear you run under apple silicon right ?

@anton-bogomazov
Copy link
Author

Thank you, @tilakrayal !
That issue is only reproducible on the Apple Silicon, the same code works well on the other platforms, as you correctly validated in Colab. I elaborated specific reason in the description: "Keras trying to fallback optimizers to their legacy versions on Apple Silicon, but the legacy version of AdamW does not exist." There is no fallback to legacy AdamW in Colab, because it is specific feature for Apple Silicon, so bug wasn't reproduced in your notebook.

@anton-bogomazov
Copy link
Author

Hello, @tilakrayal !
I'm worried that this issue will be lost in the mix of 'support' issues, so could you, please, take a look at my message above?
Thank you!

@Stephen-Cobalt
Copy link

Stephen-Cobalt commented Jul 7, 2023

@anton-bogomazov
If you need to use AdamW, you can effectively achieve the same functionality by using the weight_decay parameter for Adam. I also have been experiencing the same issue on Apple silicon and have determined that to be the best temporary solution.

@tilakrayal
Copy link
Collaborator

@anton-bogomazov,
I tried to execute the mentioned above code on both tensorflow v2.12 and tf-nighty(2.14.0-dev20230622), and it was executed without any issue/error. Kindly find the gist of it here
As it is failing only on tensorflow-macos, we request to raise the concern on the macos-apple forum for the quick resolution. Thank you!

@Netanelshoshan
Copy link

@anton-bogomazov
I found a workaround to make AdamW work on Apple Silicon with the latest version of tensorflow, tensorflow-addons.
All you need to do is to import AdamW from tensorflow_addons.optimizers and you should be good.
I'm not using it with tensorflow-metal though, there's a big impact in performance. (At least 4x slower)
Hope this helps! 🙂

@github-actions
Copy link

github-actions bot commented Aug 3, 2023

This issue is stale because it has been open for 14 days with no activity. It will be closed if no further activity occurs. Thank you.

@github-actions
Copy link

This issue was closed because it has been inactive for 28 days. Please reopen if you'd like to work on this further.

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

@anna-hope
Copy link

For anyone encountering this, this is fixed on tensorflow-macos==2.14

@ethiel
Copy link

ethiel commented Nov 23, 2023

For anyone encountering this, this is fixed on tensorflow-macos==2.14

I think the issue is not fixed, I'm in 2.15 version and I'm not able to train a BERT model using Adamw in my Apple Silicon M1 Max. And the log saying "WARNING:absl:At this time, the v2.11+ optimizer tf.keras.optimizers.AdamWeightDecay runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at tf.keras.optimizers.legacy.AdamWeightDecay" is still there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants