Stable difussion inference very slow on Arc A770

I have been trying out few examples and was able to successfully run stable diffusion[1] inference code on an Arc 770 but the execution is very slow, could you please help me debug why it is so?

The most time is taken for XPU offload and the process gets stuck for more than 3 minutes after an XPU TensorFlow device is created:

```bash
→ python sd_tf.py 
2022-11-04 08:59:45.881192: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-04 08:59:46.062025: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-11-04 08:59:46.093434: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/env/intel/oneapi/vpl/2022.2.0/lib:/opt/env/intel/oneapi/tbb/2021.7.0/env/../lib/intel64/gcc4.8:/opt/env/intel/oneapi/mpi/2021.7.0//libfabric/lib:/opt/env/intel/oneapi/mpi/2021.7.0//lib/release:/opt/env/intel/oneapi/mpi/2021.7.0//lib:/opt/env/intel/oneapi/mkl/2022.2.0/lib/intel64:/opt/env/intel/oneapi/ipp/2021.6.1/lib/intel64:/opt/env/intel/oneapi/debugger/2021.7.0/gdb/intel64/lib:/opt/env/intel/oneapi/debugger/2021.7.0/libipt/intel64/lib:/opt/env/intel/oneapi/debugger/2021.7.0/dep/lib:/opt/env/intel/oneapi/dal/2021.7.0/lib/intel64:/opt/env/intel/oneapi/compiler/2022.2.0/linux/lib:/opt/env/intel/oneapi/compiler/2022.2.0/linux/lib/x64:/opt/env/intel/oneapi/compiler/2022.2.0/linux/lib/oclfpga/host/linux64/lib:/opt/env/intel/oneapi/compiler/2022.2.0/linux/compiler/lib/intel64_lin:/opt/env/intel/oneapi/ccl/2021.7.0/lib/cpu_gpu_dpcpp:/usr/lib/x86_64-linux-gnu:
2022-11-04 08:59:46.093455: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-11-04 08:59:46.130301: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-11-04 08:59:46.907078: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/env/intel/oneapi/vpl/2022.2.0/lib:/opt/env/intel/oneapi/tbb/2021.7.0/env/../lib/intel64/gcc4.8:/opt/env/intel/oneapi/mpi/2021.7.0//libfabric/lib:/opt/env/intel/oneapi/mpi/2021.7.0//lib/release:/opt/env/intel/oneapi/mpi/2021.7.0//lib:/opt/env/intel/oneapi/mkl/2022.2.0/lib/intel64:/opt/env/intel/oneapi/ipp/2021.6.1/lib/intel64:/opt/env/intel/oneapi/debugger/2021.7.0/gdb/intel64/lib:/opt/env/intel/oneapi/debugger/2021.7.0/libipt/intel64/lib:/opt/env/intel/oneapi/debugger/2021.7.0/dep/lib:/opt/env/intel/oneapi/dal/2021.7.0/lib/intel64:/opt/env/intel/oneapi/compiler/2022.2.0/linux/lib:/opt/env/intel/oneapi/compiler/2022.2.0/linux/lib/x64:/opt/env/intel/oneapi/compiler/2022.2.0/linux/lib/oclfpga/host/linux64/lib:/opt/env/intel/oneapi/compiler/2022.2.0/linux/compiler/lib/intel64_lin:/opt/env/intel/oneapi/ccl/2021.7.0/lib/cpu_gpu_dpcpp:/usr/lib/x86_64-linux-gnu:
2022-11-04 08:59:46.907207: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/env/intel/oneapi/vpl/2022.2.0/lib:/opt/env/intel/oneapi/tbb/2021.7.0/env/../lib/intel64/gcc4.8:/opt/env/intel/oneapi/mpi/2021.7.0//libfabric/lib:/opt/env/intel/oneapi/mpi/2021.7.0//lib/release:/opt/env/intel/oneapi/mpi/2021.7.0//lib:/opt/env/intel/oneapi/mkl/2022.2.0/lib/intel64:/opt/env/intel/oneapi/ipp/2021.6.1/lib/intel64:/opt/env/intel/oneapi/debugger/2021.7.0/gdb/intel64/lib:/opt/env/intel/oneapi/debugger/2021.7.0/libipt/intel64/lib:/opt/env/intel/oneapi/debugger/2021.7.0/dep/lib:/opt/env/intel/oneapi/dal/2021.7.0/lib/intel64:/opt/env/intel/oneapi/compiler/2022.2.0/linux/lib:/opt/env/intel/oneapi/compiler/2022.2.0/linux/lib/x64:/opt/env/intel/oneapi/compiler/2022.2.0/linux/lib/oclfpga/host/linux64/lib:/opt/env/intel/oneapi/compiler/2022.2.0/linux/compiler/lib/intel64_lin:/opt/env/intel/oneapi/ccl/2021.7.0/lib/cpu_gpu_dpcpp:/usr/lib/x86_64-linux-gnu:
2022-11-04 08:59:46.907217: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2022-11-04 08:59:48.299450: I itex/core/devices/gpu/dpcpp_runtime.cc:123] Selected platform: Intel(R) Level-Zero
2022-11-04 08:59:48.299851: I itex/core/devices/gpu/dpcpp_runtime.cc:148] number of sub-devices is zero, expose root device.
2022-11-04 08:59:48.301803: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/env/intel/oneapi/vpl/2022.2.0/lib:/opt/env/intel/oneapi/tbb/2021.7.0/env/../lib/intel64/gcc4.8:/opt/env/intel/oneapi/mpi/2021.7.0//libfabric/lib:/opt/env/intel/oneapi/mpi/2021.7.0//lib/release:/opt/env/intel/oneapi/mpi/2021.7.0//lib:/opt/env/intel/oneapi/mkl/2022.2.0/lib/intel64:/opt/env/intel/oneapi/ipp/2021.6.1/lib/intel64:/opt/env/intel/oneapi/debugger/2021.7.0/gdb/intel64/lib:/opt/env/intel/oneapi/debugger/2021.7.0/libipt/intel64/lib:/opt/env/intel/oneapi/debugger/2021.7.0/dep/lib:/opt/env/intel/oneapi/dal/2021.7.0/lib/intel64:/opt/env/intel/oneapi/compiler/2022.2.0/linux/lib:/opt/env/intel/oneapi/compiler/2022.2.0/linux/lib/x64:/opt/env/intel/oneapi/compiler/2022.2.0/linux/lib/oclfpga/host/linux64/lib:/opt/env/intel/oneapi/compiler/2022.2.0/linux/compiler/lib/intel64_lin:/opt/env/intel/oneapi/ccl/2021.7.0/lib/cpu_gpu_dpcpp:/usr/lib/x86_64-linux-gnu:
2022-11-04 08:59:48.301982: W tensorflow/stream_executor/cuda/cuda_driver.cc:263] failed call to cuInit: UNKNOWN ERROR (303)
2022-11-04 08:59:48.301998: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (pop-os): /proc/driver/nvidia/version does not exist
2022-11-04 08:59:50.417331: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-04 08:59:50.419396: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform XPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-11-04 08:59:50.419440: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:XPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: XPU, pci bus id: <undefined>)
2022-11-04 09:00:46.432620: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type XPU is enabled.
 49 981:   0%|                                                                                         | 0/50 [00:00<?, ?it/s]2022-11-04 09:01:23.187110: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type XPU is enabled.
  0   1: 100%|████████████████████████████████████████████████████████████████████████████████| 50/50 [07:39<00:00,  9.20s/it]
```
After inference is complete, there is again a delay of about 2 minutes, where the process is busy but the XPU is not, I suspect it might be due to some data movement operation between the CPU and the XPU device..?

## Code to replicate:

```python
# coding: utf-8
import intel_extension_for_tensorflow as itex
import tensorflow as tf
from PIL import Image
from stable_diffusion_tf.stable_diffusion import StableDiffusion


def set_backend(backend="GPU"):
    #auto_mixed_precision_options = itex.AutoMixedPrecisionOptions()
    #auto_mixed_precision_options.data_type = itex.FLOAT16
    #graph_options = itex.GraphOptions(auto_mixed_precision_options=auto_mixed_precision_options)
    #graph_options.auto_mixed_precision = itex.ON
    #config = itex.ConfigProto(graph_options=graph_options)
    #itex.set_backend(backend, config)
    itex.set_backend(backend)


if __name__ == "__main__":
    set_backend()
    prompt = "Red air balloons in the blue sky evening golden rays from the sun paris"
    generator = StableDiffusion(
        img_height=512,
        img_width=512,
        jit_compile=False,
    )

for _ in range(1):
    img = generator.generate(
        prompt,
        num_steps=50,
        unconditional_guidance_scale=7.5,
        temperature=1,
        batch_size=1,
    )
Image.fromarray(img[0]).save("/home/rahul/sd_tf_fp32.png")
```
With `ITEX_VERBOSE=1`, the logs look like:

```bash
head -n 100 tf_stderr.log 
2022-11-04 09:59:13.193048: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-04 09:59:13.312172: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-11-04 09:59:13.316717: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/env/intel/oneapi/vpl/2022.2.0/lib:/opt/env/intel/oneapi/tbb/2021.7.0/env/../lib/intel64/gcc4.8:/opt/env/intel/oneapi/mpi/2021.7.0//libfabric/lib:/opt/env/intel/oneapi/mpi/2021.7.0//lib/release:/opt/env/intel/oneapi/mpi/2021.7.0//lib:/opt/env/intel/oneapi/mkl/2022.2.0/lib/intel64:/opt/env/intel/oneapi/ipp/2021.6.1/lib/intel64:/opt/env/intel/oneapi/debugger/2021.7.0/gdb/intel64/lib:/opt/env/intel/oneapi/debugger/2021.7.0/libipt/intel64/lib:/opt/env/intel/oneapi/debugger/2021.7.0/dep/lib:/opt/env/intel/oneapi/dal/2021.7.0/lib/intel64:/opt/env/intel/oneapi/compiler/2022.2.0/linux/lib:/opt/env/intel/oneapi/compiler/2022.2.0/linux/lib/x64:/opt/env/intel/oneapi/compiler/2022.2.0/linux/lib/oclfpga/host/linux64/lib:/opt/env/intel/oneapi/compiler/2022.2.0/linux/compiler/lib/intel64_lin:/opt/env/intel/oneapi/ccl/2021.7.0/lib/cpu_gpu_dpcpp:/usr/lib/x86_64-linux-gnu:
2022-11-04 09:59:13.316731: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-11-04 09:59:13.340501: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-11-04 09:59:13.888197: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/env/intel/oneapi/vpl/2022.2.0/lib:/opt/env/intel/oneapi/tbb/2021.7.0/env/../lib/intel64/gcc4.8:/opt/env/intel/oneapi/mpi/2021.7.0//libfabric/lib:/opt/env/intel/oneapi/mpi/2021.7.0//lib/release:/opt/env/intel/oneapi/mpi/2021.7.0//lib:/opt/env/intel/oneapi/mkl/2022.2.0/lib/intel64:/opt/env/intel/oneapi/ipp/2021.6.1/lib/intel64:/opt/env/intel/oneapi/debugger/2021.7.0/gdb/intel64/lib:/opt/env/intel/oneapi/debugger/2021.7.0/libipt/intel64/lib:/opt/env/intel/oneapi/debugger/2021.7.0/dep/lib:/opt/env/intel/oneapi/dal/2021.7.0/lib/intel64:/opt/env/intel/oneapi/compiler/2022.2.0/linux/lib:/opt/env/intel/oneapi/compiler/2022.2.0/linux/lib/x64:/opt/env/intel/oneapi/compiler/2022.2.0/linux/lib/oclfpga/host/linux64/lib:/opt/env/intel/oneapi/compiler/2022.2.0/linux/compiler/lib/intel64_lin:/opt/env/intel/oneapi/ccl/2021.7.0/lib/cpu_gpu_dpcpp:/usr/lib/x86_64-linux-gnu:
2022-11-04 09:59:13.888277: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/env/intel/oneapi/vpl/2022.2.0/lib:/opt/env/intel/oneapi/tbb/2021.7.0/env/../lib/intel64/gcc4.8:/opt/env/intel/oneapi/mpi/2021.7.0//libfabric/lib:/opt/env/intel/oneapi/mpi/2021.7.0//lib/release:/opt/env/intel/oneapi/mpi/2021.7.0//lib:/opt/env/intel/oneapi/mkl/2022.2.0/lib/intel64:/opt/env/intel/oneapi/ipp/2021.6.1/lib/intel64:/opt/env/intel/oneapi/debugger/2021.7.0/gdb/intel64/lib:/opt/env/intel/oneapi/debugger/2021.7.0/libipt/intel64/lib:/opt/env/intel/oneapi/debugger/2021.7.0/dep/lib:/opt/env/intel/oneapi/dal/2021.7.0/lib/intel64:/opt/env/intel/oneapi/compiler/2022.2.0/linux/lib:/opt/env/intel/oneapi/compiler/2022.2.0/linux/lib/x64:/opt/env/intel/oneapi/compiler/2022.2.0/linux/lib/oclfpga/host/linux64/lib:/opt/env/intel/oneapi/compiler/2022.2.0/linux/compiler/lib/intel64_lin:/opt/env/intel/oneapi/ccl/2021.7.0/lib/cpu_gpu_dpcpp:/usr/lib/x86_64-linux-gnu:
2022-11-04 09:59:13.888283: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2022-11-04 09:59:14.653824: I ./itex/core/graph/remapper/fusion.h:150] Register fusion batchmatmulv2-with-mul-addv2 with AddV2
2022-11-04 09:59:14.653848: I ./itex/core/graph/remapper/fusion.h:150] Register fusion cast-bf16fusedmatmul-cast with Cast
2022-11-04 09:59:14.653854: I ./itex/core/graph/remapper/fusion.h:150] Register fusion cast-bf16fusedmatmul-cast with Cast
2022-11-04 09:59:14.653860: I ./itex/core/graph/remapper/fusion.h:150] Register fusion cast-bf16matmul-cast with Cast
2022-11-04 09:59:14.653866: I ./itex/core/graph/remapper/fusion.h:150] Register fusion pad-with-conv_backprop_filter with Conv2DBackpropFilter
2022-11-04 09:59:14.653869: I ./itex/core/graph/remapper/fusion.h:150] Register fusion pad-with-conv_backprop_filter with Conv2DBackpropFilterWithBias
2022-11-04 09:59:14.653872: I ./itex/core/graph/remapper/fusion.h:150] Register fusion pad-with-conv_backprop_filter with Conv3DBackpropFilter
2022-11-04 09:59:14.653874: I ./itex/core/graph/remapper/fusion.h:150] Register fusion pad-with-conv_backprop_filter with Conv3DBackpropFilterV2
2022-11-04 09:59:14.653877: I ./itex/core/graph/remapper/fusion.h:150] Register fusion pad-with-conv_backprop_filter with Conv3DBackpropFilterWithBias
2022-11-04 09:59:14.653929: I ./itex/core/graph/remapper/fusion.h:150] Register fusion gru with AddV2
2022-11-04 09:59:14.654011: I ./itex/core/graph/remapper/fusion.h:150] Register fusion augru with AddV2
2022-11-04 09:59:14.654032: I ./itex/core/graph/remapper/fusion.h:150] Register fusion instancenorm with AddV2
2022-11-04 09:59:14.654058: I ./itex/core/graph/remapper/fusion.h:150] Register fusion InstanceNorm+LeakyRelu with LeakyRelu
2022-11-04 09:59:14.654091: I ./itex/core/graph/remapper/fusion.h:150] Register fusion InstanceNorm+Relu with Relu
2022-11-04 09:59:14.654103: I ./itex/core/graph/remapper/fusion.h:150] Register fusion layernorm with AddV2
2022-11-04 09:59:14.654116: I ./itex/core/graph/remapper/fusion.h:150] Register fusion layernorm-for-TransformerLT with AddV2
2022-11-04 09:59:14.654129: I ./itex/core/graph/remapper/fusion.h:150] Register fusion pad-conv3d with Conv3D
2022-11-04 09:59:14.654138: I ./itex/core/graph/remapper/fusion.h:150] Register fusion pad-conv3d-with-cast with Conv3D
2022-11-04 09:59:14.654152: I ./itex/core/graph/remapper/fusion.h:150] Register fusion resize-nearest-neighbor with ConcatV2
2022-11-04 09:59:14.654167: I ./itex/core/graph/remapper/fusion.h:150] Register fusion resize-nearest-neighbor-grad with ConcatV2
2022-11-04 09:59:14.654172: I ./itex/core/graph/remapper/fusion.h:150] Register fusion resize-nearest-neighbor-grad-v2 with ConcatV2
2022-11-04 09:59:14.654177: I ./itex/core/graph/remapper/fusion.h:150] Register fusion squeeze-resize-nearest-neighbor-grad with ResizeNearestNeighborGrad
2022-11-04 09:59:14.654188: I ./itex/core/graph/remapper/fusion.h:150] Register fusion rmsprop-compute-rms with AddV2
2022-11-04 09:59:14.654196: I ./itex/core/graph/remapper/fusion.h:150] Register fusion rmsprop-var-update with Sub
2022-11-04 09:59:14.654201: I ./itex/core/graph/remapper/fusion.h:150] Register fusion sigmoid-with-mul with Mul
2022-11-04 09:59:14.654206: I ./itex/core/graph/remapper/fusion.h:150] Register fusion sigmoid-alpha-with-mul with Mul
2022-11-04 09:59:14.664960: I itex/core/graph/xpu_graph.cc:53] ITEX config onednn_graph is OFF.
2022-11-04 09:59:14.664975: I itex/core/graph/xpu_graph.cc:53] ITEX config remapper is ON.
2022-11-04 09:59:14.664978: I itex/core/graph/xpu_graph.cc:53] ITEX config layout_opt is ON.
2022-11-04 09:59:14.664981: I itex/core/graph/xpu_graph.cc:53] ITEX config native_format is OFF.
2022-11-04 09:59:14.664984: I itex/core/graph/xpu_graph.cc:53] ITEX config auto_mixed_precision is OFF.
2022-11-04 09:59:14.664987: I itex/core/graph/xpu_graph.cc:53] ITEX config tile_as_device is ON.
2022-11-04 09:59:14.664991: I itex/core/graph/xpu_graph.cc:53] ITEX config cache_onednn_object is ON.
2022-11-04 09:59:14.735153: I itex/core/devices/gpu/dpcpp_runtime.cc:123] Selected platform: Intel(R) Level-Zero
2022-11-04 09:59:14.735547: I itex/core/devices/gpu/dpcpp_runtime.cc:148] number of sub-devices is zero, expose root device.
2022-11-04 09:59:14.735770: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/env/intel/oneapi/vpl/2022.2.0/lib:/opt/env/intel/oneapi/tbb/2021.7.0/env/../lib/intel64/gcc4.8:/opt/env/intel/oneapi/mpi/2021.7.0//libfabric/lib:/opt/env/intel/oneapi/mpi/2021.7.0//lib/release:/opt/env/intel/oneapi/mpi/2021.7.0//lib:/opt/env/intel/oneapi/mkl/2022.2.0/lib/intel64:/opt/env/intel/oneapi/ipp/2021.6.1/lib/intel64:/opt/env/intel/oneapi/debugger/2021.7.0/gdb/intel64/lib:/opt/env/intel/oneapi/debugger/2021.7.0/libipt/intel64/lib:/opt/env/intel/oneapi/debugger/2021.7.0/dep/lib:/opt/env/intel/oneapi/dal/2021.7.0/lib/intel64:/opt/env/intel/oneapi/compiler/2022.2.0/linux/lib:/opt/env/intel/oneapi/compiler/2022.2.0/linux/lib/x64:/opt/env/intel/oneapi/compiler/2022.2.0/linux/lib/oclfpga/host/linux64/lib:/opt/env/intel/oneapi/compiler/2022.2.0/linux/compiler/lib/intel64_lin:/opt/env/intel/oneapi/ccl/2021.7.0/lib/cpu_gpu_dpcpp:/usr/lib/x86_64-linux-gnu:
2022-11-04 09:59:14.735783: W tensorflow/stream_executor/cuda/cuda_driver.cc:263] failed call to cuInit: UNKNOWN ERROR (303)
2022-11-04 09:59:14.735800: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (pop-os): /proc/driver/nvidia/version does not exist
2022-11-04 09:59:16.752828: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-04 09:59:16.755135: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform XPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-11-04 09:59:16.755173: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:XPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: XPU, pci bus id: <undefined>)
2022-11-04 09:59:16.770372: I itex/core/devices/bfc_allocator.cc:26] Set memory limit to 15386382336 Bytes
2022-11-04 09:59:16.770400: I itex/core/devices/bfc_allocator.cc:37] Creating bin of max chunk size 256
2022-11-04 09:59:16.770404: I itex/core/devices/bfc_allocator.cc:37] Creating bin of max chunk size 512
2022-11-04 09:59:16.770407: I itex/core/devices/bfc_allocator.cc:37] Creating bin of max chunk size 1024
2022-11-04 09:59:16.770409: I itex/core/devices/bfc_allocator.cc:37] Creating bin of max chunk size 2048
2022-11-04 09:59:16.770411: I itex/core/devices/bfc_allocator.cc:37] Creating bin of max chunk size 4096
2022-11-04 09:59:16.770414: I itex/core/devices/bfc_allocator.cc:37] Creating bin of max chunk size 8192
2022-11-04 09:59:16.770416: I itex/core/devices/bfc_allocator.cc:37] Creating bin of max chunk size 16384
2022-11-04 09:59:16.770418: I itex/core/devices/bfc_allocator.cc:37] Creating bin of max chunk size 32768
2022-11-04 09:59:16.770421: I itex/core/devices/bfc_allocator.cc:37] Creating bin of max chunk size 65536
2022-11-04 09:59:16.770423: I itex/core/devices/bfc_allocator.cc:37] Creating bin of max chunk size 131072
2022-11-04 09:59:16.770425: I itex/core/devices/bfc_allocator.cc:37] Creating bin of max chunk size 262144
2022-11-04 09:59:16.770427: I itex/core/devices/bfc_allocator.cc:37] Creating bin of max chunk size 524288
2022-11-04 09:59:16.770430: I itex/core/devices/bfc_allocator.cc:37] Creating bin of max chunk size 1048576
2022-11-04 09:59:16.770432: I itex/core/devices/bfc_allocator.cc:37] Creating bin of max chunk size 2097152
2022-11-04 09:59:16.770434: I itex/core/devices/bfc_allocator.cc:37] Creating bin of max chunk size 4194304
2022-11-04 09:59:16.770437: I itex/core/devices/bfc_allocator.cc:37] Creating bin of max chunk size 8388608
2022-11-04 09:59:16.770439: I itex/core/devices/bfc_allocator.cc:37] Creating bin of max chunk size 16777216
2022-11-04 09:59:16.770441: I itex/core/devices/bfc_allocator.cc:37] Creating bin of max chunk size 33554432
2022-11-04 09:59:16.770444: I itex/core/devices/bfc_allocator.cc:37] Creating bin of max chunk size 67108864
2022-11-04 09:59:16.770446: I itex/core/devices/bfc_allocator.cc:37] Creating bin of max chunk size 134217728
2022-11-04 09:59:16.770448: I itex/core/devices/bfc_allocator.cc:37] Creating bin of max chunk size 268435456
2022-11-04 09:59:16.771330: I itex/core/devices/bfc_allocator.cc:302] Extending allocation by 4294967296 bytes.
2022-11-04 09:59:16.771340: I itex/core/devices/bfc_allocator.cc:305] Total allocated bytes: 4294967296
2022-11-04 09:59:16.771344: I itex/core/devices/bfc_allocator.cc:307] Allocated memory at 0xffffeaab55400000 to 0xffffeaac55400000
2022-11-04 09:59:17.678445: I ./itex/core/utils/op_kernel.h:726] AssignVariableOp,AssignVariableOp,11987
2022-11-04 09:59:17.678892: I ./itex/core/utils/op_kernel.h:726] AssignVariableOp,AssignVariableOp,3456
2022-11-04 09:59:17.679161: I ./itex/core/utils/op_kernel.h:726] AssignVariableOp,AssignVariableOp,2198
2022-11-04 09:59:17.796942: I ./itex/core/utils/op_kernel.h:726] StatelessRandomGetKeyCounter,StatelessRandomGetKeyCounter,476225
2022-11-04 09:59:18.357494: I ./itex/core/utils/op_kernel.h:726] StatelessRandomUniformV2,StatelessRandomUniformV2,547899149
2022-11-04 09:59:51.704338: I ./itex/core/utils/op_kernel.h:726] Sub,Sub,33344981458
2022-11-04 09:59:51.705890: I ./itex/core/utils/op_kernel.h:726] Mul,Mul,251732
2022-11-04 09:59:51.705949: I itex/core/devices/bfc_allocator.cc:106] Deallocate 0xffffeaab5e4c6200
2022-11-04 09:59:51.707275: I ./itex/core/utils/op_kernel.h:726] AddV2,AddV2,440534
2022-11-04 09:59:51.707317: I itex/core/devices/bfc_allocator.cc:106] Deallocate 0xffffeaab5e4c6300
2022-11-04 09:59:51.707323: I itex/core/devices/bfc_allocator.cc:345] Merging c 0xffffeaab5e4c6300 into c->prev 0xffffeaab5e4c6200
2022-11-04 09:59:51.707344: I itex/core/devices/bfc_allocator.cc:106] Deallocate 0xffffeaab55406000
2022-11-04 09:59:51.707349: I itex/core/devices/bfc_allocator.cc:106] Deallocate 0xffffeaab55406100
2022-11-04 09:59:51.707352: I itex/core/devices/bfc_allocator.cc:345] Merging c 0xffffeaab55406100 into c->prev 0xffffeaab55406000
2022-11-04 09:59:51.707355: I itex/core/devices/bfc_allocator.cc:106] Deallocate 0xffffeaab55406200
2022-11-04 09:59:51.707358: I itex/core/devices/bfc_allocator.cc:337] Merging c->next 0xffffeaab5e4c6200 with c 0xffffeaab55406200
2022-11-04 09:59:51.707361: I itex/core/devices/bfc_allocator.cc:345] Merging c 0xffffeaab55406200 into c->prev 0xffffeaab55406000
2022-11-04 09:59:51.709042: I ./itex/core/utils/op_kernel.h:726] AssignVariableOp,AssignVariableOp,6500
2022-11-04 09:59:51.713602: I ./itex/core/utils/op_kernel.h:726] StatelessRandomGetKeyCounter,StatelessRandomGetKeyCounter,27836
2022-11-04 09:59:51.745497: I ./itex/core/utils/op_kernel.h:726] StatelessRandomUniformV2,StatelessRandomUniformV2,315099
2022-11-04 09:59:51.745686: I ./itex/core/utils/op_kernel.h:726] Sub,Sub,49634
2022-11-04 09:59:51.745776: I ./itex/core/utils/op_kernel.h:726] Mul,Mul,25621
2022-11-04 09:59:51.745792: I itex/core/devices/bfc_allocator.cc:106] Deallocate 0xffffeaab5543fe00
2022-11-04 09:59:51.745872: I ./itex/core/utils/op_kernel.h:726] AddV2,AddV2,23619
....
....
```

[1]. https://github.com/divamgupta/stable-diffusion-tensorflow

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Stable difussion inference very slow on Arc A770 #10

Code to replicate:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Stable difussion inference very slow on Arc A770 #10

Description

Code to replicate:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions