Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Messages "Fallback to op-by-op mode because memset node breaks graph update" since Keras 3, don't occur in Keras 2 #19081

Open
SteffenBauer opened this issue Jan 22, 2024 · 4 comments

Comments

@SteffenBauer
Copy link

SteffenBauer commented Jan 22, 2024

Since Keras 3, messages "Fallback to op-by-op mode because memset node breaks graph update" start to clutter the log output.

Example code, plain vanilla MNIST network:

#!/usr/bin/env python3

import os
os.environ["KERAS_BACKEND"] = "tensorflow"
import keras
import numpy as np

(train_images, train_labels), (test_images, test_labels) = keras.datasets.mnist.load_data()
train_images = train_images.reshape((60000, 28, 28, 1))
train_images = train_images.astype(keras.backend.floatx()) / 255
test_images = test_images.reshape((10000, 28, 28, 1))
test_images = test_images.astype(keras.backend.floatx()) / 255

train_labels = keras.utils.to_categorical(train_labels)
test_labels = keras.utils.to_categorical(test_labels)

def network_basic():
    inp = keras.layers.Input(shape = (28, 28, 1), name='Input')
    x = keras.layers.Conv2D(20, (3, 3), activation='relu', name='Conv_1')(inp)
    x = keras.layers.MaxPooling2D((2, 2), name='Pool_1')(x)
    x = keras.layers.Conv2D(50, (3, 3), activation='relu', name='Conv_2')(x)
    x = keras.layers.MaxPooling2D((2, 2), name='Pool_2')(x)
    x = keras.layers.Flatten()(x)
    x = keras.layers.Dense(500, activation='relu')(x)
    out = keras.layers.Dense(10, activation='softmax', name='predictions')(x)
    network = keras.models.Model(inputs=inp, outputs=out)
    return network

network = network_basic()
network.compile(optimizer=keras.optimizers.SGD(learning_rate=0.01, momentum=0.9),
                loss='categorical_crossentropy',
                metrics=['accuracy'])
network.summary()
history = network.fit(train_images, train_labels, epochs=5, batch_size=64, validation_data=(test_images, test_labels))
test_loss, test_acc = network.evaluate(test_images, test_labels)
print()
print("Test loss", test_loss)
print("Test accuracy", test_acc)

Output with Keras 3.0.4:

Model: "functional_1"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Layer (type)                       ┃ Output Shape                  ┃     Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Input (InputLayer)                 │ (None, 28, 28, 1)             │           0 │
├────────────────────────────────────┼───────────────────────────────┼─────────────┤
│ Conv_1 (Conv2D)                    │ (None, 26, 26, 20)            │         200 │
├────────────────────────────────────┼───────────────────────────────┼─────────────┤
│ Pool_1 (MaxPooling2D)              │ (None, 13, 13, 20)            │           0 │
├────────────────────────────────────┼───────────────────────────────┼─────────────┤
│ Conv_2 (Conv2D)                    │ (None, 11, 11, 50)            │       9,050 │
├────────────────────────────────────┼───────────────────────────────┼─────────────┤
│ Pool_2 (MaxPooling2D)              │ (None, 5, 5, 50)              │           0 │
├────────────────────────────────────┼───────────────────────────────┼─────────────┤
│ flatten (Flatten)                  │ (None, 1250)                  │           0 │
├────────────────────────────────────┼───────────────────────────────┼─────────────┤
│ dense (Dense)                      │ (None, 500)                   │     625,500 │
├────────────────────────────────────┼───────────────────────────────┼─────────────┤
│ predictions (Dense)                │ (None, 10)                    │       5,010 │
└────────────────────────────────────┴───────────────────────────────┴─────────────┘
 Total params: 639,760 (2.44 MB)
 Trainable params: 639,760 (2.44 MB)
 Non-trainable params: 0 (0.00 B)
Epoch 1/5
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1705909017.454655    1642 device_compiler.h:186] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
W0000 00:00:1705909017.473550    1642 graph_launch.cc:671] Fallback to op-by-op mode because memset node breaks graph update
932/938 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.8086 - loss: 0.6093W0000 00:00:1705909028.941317    1643 graph_launch.cc:671] Fallback to op-by-op mode because memset node breaks graph update
938/938 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.8093 - loss: 0.6070W0000 00:00:1705909030.596956    1641 graph_launch.cc:671] Fallback to op-by-op mode because memset node breaks graph update
938/938 ━━━━━━━━━━━━━━━━━━━━ 23s 16ms/step - accuracy: 0.8094 - loss: 0.6066 - val_accuracy: 0.9776 - val_loss: 0.0684
Epoch 2/5
938/938 ━━━━━━━━━━━━━━━━━━━━ 8s 8ms/step - accuracy: 0.9793 - loss: 0.0674 - val_accuracy: 0.9865 - val_loss: 0.0406
Epoch 3/5
938/938 ━━━━━━━━━━━━━━━━━━━━ 8s 8ms/step - accuracy: 0.9872 - loss: 0.0411 - val_accuracy: 0.9874 - val_loss: 0.0377
Epoch 4/5
938/938 ━━━━━━━━━━━━━━━━━━━━ 8s 8ms/step - accuracy: 0.9907 - loss: 0.0305 - val_accuracy: 0.9870 - val_loss: 0.0379
Epoch 5/5
938/938 ━━━━━━━━━━━━━━━━━━━━ 8s 8ms/step - accuracy: 0.9915 - loss: 0.0273 - val_accuracy: 0.9896 - val_loss: 0.0320
W0000 00:00:1705909064.679903    1644 graph_launch.cc:671] Fallback to op-by-op mode because memset node breaks graph update
313/313 ━━━━━━━━━━━━━━━━━━━━ 2s 5ms/step - accuracy: 0.9856 - loss: 0.0412

Test loss 0.032079923897981644
Test accuracy 0.9896000027656555

Output same code with Keras 2.15.0:

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #
=================================================================
 Input (InputLayer)          [(None, 28, 28, 1)]       0

 Conv_1 (Conv2D)             (None, 26, 26, 20)        200

 Pool_1 (MaxPooling2D)       (None, 13, 13, 20)        0

 Conv_2 (Conv2D)             (None, 11, 11, 50)        9050

 Pool_2 (MaxPooling2D)       (None, 5, 5, 50)          0

 flatten (Flatten)           (None, 1250)              0

 dense (Dense)               (None, 500)               625500

 predictions (Dense)         (None, 10)                5010

=================================================================
Total params: 639760 (2.44 MB)
Trainable params: 639760 (2.44 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
Epoch 1/5
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1705909693.429130    2404 device_compiler.h:186] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
938/938 [==============================] - 31s 17ms/step - loss: 0.2407 - accuracy: 0.9257 - val_loss: 0.0857 - val_accuracy: 0.9730
Epoch 2/5
938/938 [==============================] - 14s 15ms/step - loss: 0.0645 - accuracy: 0.9800 - val_loss: 0.0516 - val_accuracy: 0.9840
Epoch 3/5
938/938 [==============================] - 14s 15ms/step - loss: 0.0435 - accuracy: 0.9863 - val_loss: 0.0369 - val_accuracy: 0.9872
Epoch 4/5
938/938 [==============================] - 14s 15ms/step - loss: 0.0318 - accuracy: 0.9899 - val_loss: 0.0404 - val_accuracy: 0.9861
Epoch 5/5
938/938 [==============================] - 14s 15ms/step - loss: 0.0262 - accuracy: 0.9918 - val_loss: 0.0307 - val_accuracy: 0.9890
313/313 [==============================] - 2s 8ms/step - loss: 0.0307 - accuracy: 0.9890

Test loss 0.030733594670891762
Test accuracy 0.9890000224113464

These messages also appear in some of the examples on the official Keras webpage, for example here:
https://keras.io/examples/generative/vae/

so it does not seem to be a big deal, but I wanted to report this, as it catched my attention, these messages are cluttering the log, and it perhaps should be investigated what is going on here and if this might indicate some deeper issue.

Environment:

  • Ubuntu 22.04, ARM64 Jetson Orin Dev kit
  • Tensorflow 2.15.0
  • Keras 3.0.4 / 2.15.0
@sachinprasadhs
Copy link
Collaborator

Hi, This warning is not related to or handled by Keras, it is coming from the tensorflow backend you are using.
When tensorflow uses XLA, for various reason they choose to go ahead with op-to-op mode, one such reason is here https://github.com/tensorflow/tensorflow/blob/master/third_party/xla/xla/service/gpu/runtime/graph_launch.cc#L637-L639.
You can find many in the same file and same thing is displayed as a warning in the output.

@SteffenBauer
Copy link
Author

I see, I already suspected that it is something with the backend. I am just puzzled why the message only happens with Keras 3, but not with Keras 2. Both Keras versions log that XLA was used, but only Keras 3 shows that warning.

@qlzh727
Copy link
Member

qlzh727 commented Jan 25, 2024

Triage notes: one reason might be that Keras 3 use jit compile by default, and Keras 2 doesn't (user need to explicitly enable jit), and the warning message is probably from jit/xla.

@qlzh727 qlzh727 removed the keras-team-review-pending Pending review by a Keras team member. label Jan 25, 2024
@juneedpk
Copy link

juneedpk commented Apr 1, 2024

how we can try to avoid that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants