## Gotchas when using Keras/Tensorflow

couple things to note when using tensorflow as your backend for keras

### ALWAYS import keras as from tensorflow import keras
If you just import keras without using the from tensorflow, things may work a bit, but there are places where it WILL break

Before noticing some weird behavior, I always used:
```python
import numpy as np
import keras
```

The "issue" did not pop up until I was using `fit_generator()` when something that worked great basically threw weird error
(Note. I only found out about the issue while playing around using mxnet as keras backend. MXnet states to use `channels_first`.

### channel_first vs channel_last
When using cuDNN library from NVidia as part of CUDA package, it's best to set the keras.json content to use channels_first

```json
{
    "floatx": "float32",
    "epsilon": 1e-07,
    "backend": "tensorflow",
    "image_data_format": "channels_first"
}
```

Let's check out the benchmark using the typical tensorflow default of `"channels_last"`. This is the content of the keras.json file:
```json
{
    "floatx": "float32",
    "epsilon": 1e-07,
    "backend": "tensorflow",
    "image_data_format": "channels_last"
}
```

I am not going to run the code but the model used as a benchmark using MNIST data is listed below:

```text
# define the model: 
model = keras.models.Sequential()
model.add( keras.layers.Conv2D(32, kernel_size=(3,3), input_shape=input_shape , activation='relu' ))
model.add( keras.layers.Dropout(rate=0.05))
model.add( keras.layers.Conv2D(32, kernel_size=(3,3), activation='relu' ))
model.add( keras.layers.MaxPooling2D(pool_size=(2,2)))

model.add( keras.layers.Dropout(rate=0.5))

model.add( keras.layers.Conv2D(64, kernel_size=(3,3), activation='relu' ))
model.add( keras.layers.Conv2D(64, kernel_size=(3,3), activation='relu' ))
model.add( keras.layers.MaxPooling2D(pool_size=(2,2)))

model.add( keras.layers.Dropout(rate=0.5))

model.add( keras.layers.Flatten())
model.add( keras.layers.Dense(265, activation='relu'))
model.add( keras.layers.Dropout(rate=0.5))
model.add( keras.layers.Dense(10, activation='softmax'))

# compile to model
model.compile(optimizer='adam',
			loss='categorical_crossentropy',
			metrics=['accuracy'])

model.summary()
```
### The output
I am using old NVidia GTX 770 (with 2GB VRam) - a slow card - to run the benchmark
```

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
conv2d_1 (Conv2D)            (None, 26, 26, 32)        320
_________________________________________________________________
dropout_1 (Dropout)          (None, 26, 26, 32)        0
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 24, 24, 32)        9248
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 12, 12, 32)        0
_________________________________________________________________
dropout_2 (Dropout)          (None, 12, 12, 32)        0
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 10, 10, 64)        18496
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 8, 8, 64)          36928
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 4, 4, 64)          0
_________________________________________________________________
dropout_3 (Dropout)          (None, 4, 4, 64)          0
_________________________________________________________________
flatten_1 (Flatten)          (None, 1024)              0
_________________________________________________________________
dense_1 (Dense)              (None, 265)               271625
_________________________________________________________________
dropout_4 (Dropout)          (None, 265)               0
_________________________________________________________________
dense_2 (Dense)              (None, 10)                2660
=================================================================
Total params: 339,277
Trainable params: 339,277
Non-trainable params: 0
_________________________________________________________________

Train on 60000 samples, validate on 10000 samples
Epoch 1/5
60000/60000 [==============================] - 11s 191us/step - loss: 0.4193 - acc: 0.8634 - val_loss: 0.0685 - val_acc: 0.9775
Epoch 2/5
60000/60000 [==============================] - 9s 151us/step - loss: 0.1197 - acc: 0.9632 - val_loss: 0.0414 - val_acc: 0.9861
Epoch 3/5
60000/60000 [==============================] - 9s 151us/step - loss: 0.0880 - acc: 0.9732 - val_loss: 0.0313 - val_acc: 0.9903
Epoch 4/5
60000/60000 [==============================] - 9s 151us/step - loss: 0.0719 - acc: 0.9778 - val_loss: 0.0299 - val_acc: 0.9906
Epoch 5/5
60000/60000 [==============================] - 9s 151us/step - loss: 0.0622 - acc: 0.9809 - val_loss: 0.0258 - val_acc: 0.9920
```

### The same code but with image_data_format set to "channels_first"
```
Train on 60000 samples, validate on 10000 samples
Epoch 1/5
2019-03-07 04:56:08.997189: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library cublas64_100.dll locally
60000/60000 [==============================] - 10s 169us/step - loss: 0.3917 - acc: 0.8727 - val_loss: 0.0565 - val_acc: 0.9812
Epoch 2/5
60000/60000 [==============================] - 9s 144us/step - loss: 0.1133 - acc: 0.9652 - val_loss: 0.0395 - val_acc: 0.9881
Epoch 3/5
60000/60000 [==============================] - 9s 145us/step - loss: 0.0827 - acc: 0.9750 - val_loss: 0.0286 - val_acc: 0.9912
Epoch 4/5
60000/60000 [==============================] - 9s 144us/step - loss: 0.0698 - acc: 0.9782 - val_loss: 0.0245 - val_acc: 0.9919
Epoch 5/5
60000/60000 [==============================] - 9s 143us/step - loss: 0.0591 - acc: 0.9823 - val_loss: 0.0216 - val_acc: 0.9928
```

### As you can see, the channels_first is about 4% faster. On a faster GPU and running a more complex model, the speed up will be 2x faster ( according to MXNet folks )

### The issue with whether you import keras or from tensorflow import keras
First the keras version is a bit different depending on how you import

```python
import keras
print(keras.__version__)
```
This prints: `2.2.4.1`

```python
from tensorflow import keras
print(keras.__version__)
```
This prints: `2.2.4-tf`

### The Issue
The issue with just importing keras came up when running `fit_generator()`:
```python
# use datagen against it
datagen = keras.preprocessing.image.ImageDataGenerator(
  rotation_range = 45,
  width_shift_range=0.15,
  height_shift_range=0.15,
  zoom_range = 0.2,
  fill_mode='nearest'
)

# this will generate few parameters to be used by data gen
datagen.fit(x_train)

fit_history2 = model.fit_generator(
  datagen.flow(x_train,y_train,batch_size=200),
  epochs = 150,
  validation_data = (x_test, y_test)
)
```

The error message thrown were:
```text
ValueError: `steps_per_epoch=None` is only valid for a generator based on the `keras.utils.Sequence` class. Please specify `steps_per_epoch` or use the `keras.utils.Sequence` class.
```

Of course, I could have fixed the issue by adding `steps_per_epoch`. Somehow the tensorflow version "knew" what that value was but default keras did not.

```python
fit_history2 = model.fit_generator(
  datagen.flow(x_train,y_train,batch_size=200),
  steps_per_epoch=len(x_train),
  epochs = 150,
  validation_data = (x_test, y_test)
)
```