# Speed Test

### **Conclusion**
- **~1.8X faster on GPU compared to CPU on Apple Silicon. On Intel processors, 2X faster with CPU (but metal is not supported on intel processor)** 
- **CuDNN** (NVIDIA’s deep learning library) significantly improves performance for LSTM-based models compared to vanilla RNNs. (See LSTM vs RNN lstm in keras documentation)

---

### **Model Details**
- **Model Type:** BiLSTM  
- **Total Parameters:** 2,658,945 (~10.14 MB)  
- **Training Data:** 25,000 sequences  
- **Features:** 1 (number of occurrences in the corpus)

---

### **Rules of Thumb**
1. **Sample Size:**  
   `nb_samples > 10 × nb_features × nb_classes`  
   (with `nb_classes = 5` for regression).

2. **Parameter-Sample Ratio:**  
   `nb_parameters < nb_samples / 10`  
   (or even `nb_samples / 50` for deep learning).

---

### **GPU Performance**

#### **tf.keras**  
- Run 1: **146s** (~179ms/step) — loss: `0.4118`, accuracy: `0.8075`, val_loss: `0.3565`, val_accuracy: `0.8464`  
- Run 2 (Apple Silicon): **29s** (~32ms/step) — loss: `0.4167`, accuracy: `0.8018`, val_loss: `0.3612`, val_accuracy: `0.8474`

#### **Keras**  
- Run 1: **119s** (~146ms/step) — loss: `0.4102`, accuracy: `0.8142`, val_loss: `0.3496`, val_accuracy: `0.8467`  
- Run 2 (Apple Silicon): **25s** (~30ms/step) — loss: `0.4167`, accuracy: `0.8018`, val_loss: `0.3611`, val_accuracy: `0.8474`

---

### **CPU Performance**

#### **tf.keras**  
- Run 1: **73s** (~89ms/step) — loss: `0.4184`, accuracy: `0.8078`, val_loss: `0.3444`, val_accuracy: `0.8507`  
- Run 2 (Apple Silicon): **47s** (~59ms/step) — loss: `0.5113`, accuracy: `0.7238`, val_loss: `0.3405`, val_accuracy: `0.8538`

#### **Keras**  
- Run 1: **81s** (~97ms/step) — loss: `0.4027`, accuracy: `0.8147`, val_loss: `0.3395`, val_accuracy: `0.8508`  
- Run 2 (Apple Silicon): **47s** (~58ms/step) — loss: `0.5113`, accuracy: `0.7238`, val_loss: `0.3405`, val_accuracy: `0.853

---

### **Key Observations**
1. **CPU vs GPU:**  
   - GPU training is approximately **2X faster** than CPU for this BiLSTM model.  

2. **Accuracy and Loss:**  
   - Both CPU and GPU achieve similar final accuracy and loss values, demonstrating functional equivalence.  

3. **Recommendation:**  
   - Use CPU for smaller datasets or models like BiLSTM.  
   - For larger datasets or deeper architectures, GPU with CuDNN can provide significant speedups.


In [None]:
import tensorflow as tf

devices = tf.config.list_physical_devices()
print("\nDevices: ", devices)

gpus = tf.config.list_physical_devices("GPU")
if gpus:
    details = tf.config.experimental.get_device_details(gpus[0])
    print("GPU details: ", details)

In [None]:
# get data
import tensorflow as tf
from tensorflow import keras
import numpy as np

np.random.seed(42)  # for reproducibility

max_features = 20000
maxlen = 100  # cut texts after this number of words (among top max_features most common words)
batch_size = 32

(x_train, y_train), (x_test, y_test) = keras.datasets.imdb.load_data(
    num_words=max_features
)

x_train = tf.keras.preprocessing.sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = tf.keras.preprocessing.sequence.pad_sequences(x_test, maxlen=maxlen)
print(x_train.shape, x_train[:2], y_train.shape, y_train[:2])
print(
    f"{x_train.shape[0]} train samples and {y_test.shape[0]} tests samples, for a total of {x_train.shape[0] + x_test.shape[0]} samples."
)
y_train = np.array(y_train)
y_test = np.array(y_test)

In [None]:
from tensorflow import keras

print(f"Using tf.keras version {keras.__version__}")

# Define the input layer
inputs = keras.Input(shape=(maxlen,))  # maxlen is the input length for each sequence

# Embedding layer
x = keras.layers.Embedding(max_features, 128)(
    inputs
)  # max_features is the vocabulary size

# Bidirectional LSTM layer
x = keras.layers.Bidirectional(keras.layers.LSTM(64))(x)

# Dropout layer for regularization
x = keras.layers.Dropout(0.5)(x)

# Output layer (for binary classification)
outputs = keras.layers.Dense(1, activation="sigmoid")(x)

# Define the model with inputs and outputs
model = keras.Model(inputs=inputs, outputs=outputs, name="BiLSTM")

# Print model summary
model.summary()

# Compile the model
model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])

# Train the model
print("Train...")
model.fit(
    x_train, y_train, batch_size=batch_size, epochs=1, validation_data=(x_test, y_test)
)

In [None]:
import keras

print(f"Using keras version {keras.__version__}")

model = keras.models.Sequential(name="BiLSTM")
model.add(keras.layers.Embedding(max_features, 128, input_length=maxlen))
model.add(keras.layers.Bidirectional(keras.layers.LSTM(64)))
model.add(keras.layers.Dropout(0.5))
model.add(keras.layers.Dense(1, activation="sigmoid"))
model.summary()
# try using different optimizers and different optimizer configs
model.compile("adam", "binary_crossentropy", metrics=["accuracy"])

print("Train...")
model.fit(
    x_train, y_train, batch_size=batch_size, epochs=1, validation_data=[x_test, y_test]
)

In [None]:
# # Comparing GPU performance of LSTM using CuDNN (keras.layers.LSTM) and not using CuDNN (keras.layers.LSTMCell)

# from tensorflow import keras
# from keras.models import Sequential
# from keras.layers import Dense, Dropout, Embedding, LSTM, Input, Bidirectional

# keras.utils.set_random_seed(42)


# def build_model(allow_cudnn_kernel=True):
#     # CuDNN is only available at the layer level, and not at the cell level.
#     # This means `LSTM(units)` will use the CuDNN kernel,
#     # while RNN(LSTMCell(units)) will run on non-CuDNN kernel.
#     if allow_cudnn_kernel:
#         # The LSTM layer with default options uses CuDNN.
#         model = Sequential(name="BiLSTM with CuDNN")
#         lstm_layer = keras.layers.LSTM(64)
#     else:
#         # Wrapping a LSTMCell in a RNN layer will not use CuDNN.
#         model = Sequential(name="BiLSTM without CuDNN")
#         lstm_layer = keras.layers.RNN(keras.layers.LSTMCell(64))
#     model.add(Embedding(max_features, 128, input_length=maxlen))
#     model.add(Bidirectional(lstm_layer))
#     model.add(Dropout(0.5))
#     model.add(Dense(1, activation="sigmoid"))
#     model.summary()
#     return model


# model = build_model(allow_cudnn_kernel=True)
# model.compile("adam", "binary_crossentropy", metrics=["accuracy"])

# print("Train model using CuDNN kernel...")
# model.fit(
#     x_train, y_train, batch_size=batch_size, epochs=1, validation_data=[x_test, y_test]
# )


# model_noncudnn = build_model(allow_cudnn_kernel=False)
# model_noncudnn.compile("adam", "binary_crossentropy", metrics=["accuracy"])

# print("Train model not using CuDNN kernel...")
# model_noncudnn.fit(
#     x_train, y_train, batch_size=batch_size, epochs=1, validation_data=[x_test, y_test]
# )