Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python tracing failure for a tensorflow program #1858

Closed
honggyukim opened this issue Dec 11, 2023 · 3 comments
Closed

python tracing failure for a tensorflow program #1858

honggyukim opened this issue Dec 11, 2023 · 3 comments

Comments

@honggyukim
Copy link
Collaborator

honggyukim commented Dec 11, 2023

I've recorded the following tensorflow program with uftrace.

$ cat tensor.py 
#!/usr/bin/env python3

import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

# Load and preprocess the MNIST dataset
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.reshape((60000, 28, 28, 1)).astype('float32') / 255
test_images = test_images.reshape((10000, 28, 28, 1)).astype('float32') / 255

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

# Build a simple feedforward neural network model
model = models.Sequential()
model.add(layers.Flatten(input_shape=(28, 28, 1)))
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

# Compile the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Train the model
model.fit(train_images, train_labels, epochs=5, batch_size=64, validation_split=0.2)

# Evaluate the model on the test set
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f'Test Accuracy: {test_acc}')

It works fine without uftrace.

$ python3 -m pip install tensorflow
      ...
$ ./tensor.py 
      ...
Epoch 1/5
750/750 [==============================] - 1s 2ms/step - loss: 0.3309 - accuracy: 0.9057 - val_loss: 0.1732 - val_accuracy: 0.9518
Epoch 2/5
750/750 [==============================] - 1s 1ms/step - loss: 0.1519 - accuracy: 0.9557 - val_loss: 0.1347 - val_accuracy: 0.9626
Epoch 3/5
750/750 [==============================] - 1s 1ms/step - loss: 0.1069 - accuracy: 0.9697 - val_loss: 0.1105 - val_accuracy: 0.9664
Epoch 4/5
750/750 [==============================] - 1s 2ms/step - loss: 0.0814 - accuracy: 0.9766 - val_loss: 0.1027 - val_accuracy: 0.9691
Epoch 5/5
750/750 [==============================] - 1s 2ms/step - loss: 0.0650 - accuracy: 0.9815 - val_loss: 0.0919 - val_accuracy: 0.9728
313/313 [==============================] - 0s 831us/step - loss: 0.0843 - accuracy: 0.9744
Test Accuracy: 0.974399983882904

But it fails when running with uftrace as follows.

$ uftrace record ./tensor.py 
      ...
WARN: process crashed by signal 11: Segmentation fault (si_code: 128)
WARN:  if this happens only with uftrace, please consider -e/--estimate-return option.

WARN: Backtrace from uftrace v0.14-73-g8343 ( x86_64 dwarf python3 luajit tui perf sched dynamic )
WARN: =====================================
WARN: [1] (<8272c>[8272c] <= <0>[0])
WARN: [0] (<1>[1] <= <0>[0])

Please report this bug to https://github.com/namhyung/uftrace/issues.

WARN: child terminated by signal: 11: Segmentation fault
@honggyukim
Copy link
Collaborator Author

honggyukim commented Dec 12, 2023

The same crash happens with the following simple import statement.

$ cat import-tensor.py
#!/usr/bin/env python3
import tensorflow
$ uftrace record import-tensor.py 
      ...
WARN: Segmentation fault: invalid permission (addr: 0x7fde91800000)
WARN:  if this happens only with uftrace, please consider -e/--estimate-return option.

WARN: Backtrace from uftrace v0.14-75-g0042 ( x86_64 dwarf python3 luajit tui perf sched dynamic kernel )
WARN: =====================================
WARN: [1] (<2>[2] <= <0>[0])
WARN: [0] (<1>[1] <= <0>[0])

Please report this bug to https://github.com/namhyung/uftrace/issues.

WARN: child terminated by signal: 11: Segmentation fault

yihong0618 added a commit to yihong0618/uftrace that referenced this issue Feb 8, 2024
size, for issue namhyung#1858 import tensorflow will import a huge default
packages, the `update_dbg_info` will fail casue Segmentation fault

Signed-off-by: Yi Hong <zouzou0208@gmail.com>
yihong0618 added a commit to yihong0618/uftrace that referenced this issue Feb 8, 2024
size, for issue namhyung#1858 import tensorflow will import a huge default
packages, the `update_dbg_info` will fail casue Segmentation fault

Signed-off-by: yihong0618 <zouzou0208@gmail.com>
yihong0618 added a commit to yihong0618/uftrace that referenced this issue Feb 8, 2024
size, for issue namhyung#1858 import tensorflow will import a huge default
packages, the `update_dbg_info` will fail casue Segmentation fault

Signed-off-by: Yi Hong <zouzou0208@gmail.com>
@yihong0618
Copy link
Contributor

@honggyukim
after #1884

# DURATION     TID     FUNCTION
            [ 32182] | __main__.<module>() {
   5.347  s [ 32182] |   importlib._bootstrap._find_and_load();
  27.050 ms [ 32182] |   importlib._bootstrap._find_and_load();
   3.497 us [ 32182] |   importlib._bootstrap._handle_fromlist();
 273.711 us [ 32182] |   importlib._bootstrap._find_and_load();
   2.124 us [ 32182] |   importlib._bootstrap._handle_fromlist();
 276.716 us [ 32182] |   importlib._bootstrap._find_and_load();
   2.034 us [ 32182] |   importlib._bootstrap._handle_fromlist();
   6.608  s [ 32182] |   keras.src.datasets.mnist.load_data();
   2.164 us [ 32182] |   numpy.lib.npyio.__del__();
   2.274 us [ 32182] |   ndarray.reshape();
  22.102 ms [ 32182] |   ndarray.astype();
   7.303 us [ 32182] |   ndarray.reshape();
   4.253 ms [ 32182] |   ndarray.astype();
   2.284 ms [ 32182] |   keras.src.utils.np_utils.to_categorical();
 194.884 us [ 32182] |   keras.src.utils.np_utils.to_categorical();
  11.310 ms [ 32182] |   keras.src.engine.training.__new__();
  35.318 ms [ 32182] |   tensorflow.python.trackable.base._method_wrapper();
 885.482 us [ 32182] |   keras.src.engine.base_layer.__new__();
   2.456 ms [ 32182] |   keras.src.layers.reshaping.flatten.__init__();
  20.280 ms [ 32182] |   tensorflow.python.trackable.base._method_wrapper();
 653.539 us [ 32182] |   keras.src.engine.base_layer.__new__();
   5.040 ms [ 32182] |   keras.src.dtensor.utils._wrap_function();
  29.463 ms [ 32182] |   tensorflow.python.trackable.base._method_wrapper();
 657.797 us [ 32182] |   keras.src.engine.base_layer.__new__();
   2.763 ms [ 32182] |   keras.src.dtensor.utils._wrap_function();
  19.374 ms [ 32182] |   tensorflow.python.trackable.base._method_wrapper();
  17.496 ms [ 32182] |   keras.src.utils.traceback_utils.error_handler();
  10.072  s [ 32182] |   keras.src.utils.traceback_utils.error_handler();
 901.858 ms [ 32182] |   keras.src.utils.traceback_utils.error_handler();
   6.923 us [ 32182] |   builtins.print();
  23.167  s [ 32182] | } /* __main__.<module> */

@honggyukim
Copy link
Collaborator Author

Hi @yihong0618, I also see that it work fine with your fix! Superb!

yihong0618 added a commit to yihong0618/uftrace that referenced this issue Feb 8, 2024
For issue namhyung#1858: Importing tensorflow triggers the import of
several large default packages, causing the update_dbg_info
function to fail with a Segmentation Fault.

Signed-off-by: Yi Hong <zouzou0208@gmail.com>
yihong0618 added a commit to yihong0618/uftrace that referenced this issue Feb 8, 2024
For issue namhyung#1858: the logic to increase uftrace_dbginfo_size
should be new_hdr.offset >= uftrace_dbginfo_size instead of
new_hdr.offset >= uftrace_symtab_size.

Signed-off-by: Yi Hong <zouzou0208@gmail.com>
yihong0618 added a commit to yihong0618/uftrace that referenced this issue Feb 8, 2024
The logic to increase uftrace_dbginfo_size should be compared
with uftrace_dbginfo_size instead of uftrace_symtab_size.

This looks like a mistake due to confusion with get_new_sym_addr.

Fixed: namhyung#1858
Signed-off-by: Yi Hong <zouzou0208@gmail.com>
Signed-off-by: Honggyu Kim <honggyu.kp@gmail.com>
shen390s pushed a commit to shen390s/uftrace that referenced this issue Feb 19, 2024
The logic to increase uftrace_dbginfo_size should be compared
with uftrace_dbginfo_size instead of uftrace_symtab_size.

This looks like a mistake due to confusion with get_new_sym_addr.

Fixed: namhyung#1858
Signed-off-by: Yi Hong <zouzou0208@gmail.com>
Signed-off-by: Honggyu Kim <honggyu.kp@gmail.com>
@namhyung namhyung added this to the v0.16 milestone Apr 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants