## Why does the model predict significantly slower after `compile`?
I have noticed that after the `.compile()` call, the model will predict a lot slower, even after training.

This mean that it can affect model speed in real-time inference like object detection in webcam.

This experiment try to reproduce the issue as clear as possible.

See related question here: https://stackoverflow.com/q/58378374/2593810

In [1]:
import tensorflow as tf
kr = tf.keras
import numpy as np
np.set_printoptions(suppress=True)
tf.__version__, kr.__version__, np.__version__

('2.0.0', '2.2.4-tf', '1.16.5')

In [2]:
model = kr.Sequential([
    kr.layers.Dense(2000, activation='relu', input_shape=(5,)),
    kr.layers.Dense(2000, activation='relu'),
    kr.layers.Dense(5, activation='softmax')
])
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 2000)              12000     
_________________________________________________________________
dense_1 (Dense)              (None, 2000)              4002000   
_________________________________________________________________
dense_2 (Dense)              (None, 5)                 10005     
Total params: 4,024,005
Trainable params: 4,024,005
Non-trainable params: 0
_________________________________________________________________


# Test prediction speed

In [3]:
x = np.random.random((1, 5))

In [4]:
model.predict(x)

array([[0.19717476, 0.19599418, 0.2063633 , 0.20119473, 0.19927306]],
      dtype=float32)

In [5]:
%%timeit -n 20
model.predict(x)

2.93 ms ± 278 µs per loop (mean ± std. dev. of 7 runs, 20 loops each)


### Compile and test speed

In [6]:
model.compile(kr.optimizers.SGD(momentum=0.9), 'sparse_categorical_crossentropy', ['acc'])

In [7]:
model.predict(x)

array([[0.19717476, 0.19599418, 0.2063633 , 0.20119473, 0.19927306]],
      dtype=float32)

In [8]:
%%timeit -n 20
model.predict(x)

27 ms ± 770 µs per loop (mean ± std. dev. of 7 runs, 20 loops each)


_Notice that speed after compile is significantly lower than before compile._

### Train and test speed

In [9]:
from sklearn.model_selection import train_test_split
# create dummy dataset, where y is 1 only at the same index that X is maximum, and 0 everywhere else
X = np.random.random((5000, 5))
Y = np.argmax(X, axis=1)
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.2)

In [10]:
%%time
model.fit(x_train, y_train, epochs=5, validation_split=0.2)

Train on 3200 samples, validate on 800 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Wall time: 15.2 s


<tensorflow.python.keras.callbacks.History at 0x11c9c72e588>

In [11]:
model.evaluate(x_test, y_test, batch_size=128, verbose=0)

[0.2094351042509079, 0.942]

In [12]:
x, model.predict(x)

(array([[0.71759054, 0.88347487, 0.21729862, 0.01851623, 0.87170631]]),
 array([[0.02285822, 0.7512861 , 0.00000684, 0.00000017, 0.22584875]],
       dtype=float32))

In [13]:
%%timeit -n 20
model.predict(x)

28.3 ms ± 1.9 ms per loop (mean ± std. dev. of 7 runs, 20 loops each)


_Notice that the speed is still slow after fitting_

### I tried saving the model in HDF5 format without optimizer and then the speed came back.

In [14]:
model.save('model.h5', include_optimizer=False, save_format='h5')
model2 = kr.models.load_model('model.h5')



In [15]:
%%timeit -n 20
model2.predict(x)

3.28 ms ± 555 µs per loop (mean ± std. dev. of 7 runs, 20 loops each)


# Conclusion
The model is indeed slower after `compile()`, but why does that happen? _I don't know._

I'm quite certain that it is a bug or an unintended surprise. 

As a user, you are expecting the model to run as fast as possible when calling `predict()` because you have only one way to do prediction from the model.

Think about the `numpy` variant, `Dense` layers are simply a bunch of matrix multiplications, vector additions, and non-linearity activations.
Those operations are performed to the input `x` and the weights inside the model. It will not be slower or faster if there is no garbage being computed.
`predict()` should consume constant time always. If it deviates from this it is likely a bug.

Let's see the solution that we will get from this GitHub issue: https://github.com/tensorflow/tensorflow/issues/33340