# Speed Test

### **Conclusion**
- **12x faster with GPU compared to CPU. (53s vs 680s) on Apple Silicon (test jan 24)**

---

### **Model Details**
- **Model Type:** Very Deep CNN (ResNet, 50 layers)
- **Total Parameters:** 23,792,612 (~90.76 MB)
- **Training Data:** 50,000 samples
- **Input Features:** `32 × 32 × 3 ≈ 3,000`

---

### **Rules of Thumb**
1. **Sample Size:**  
   `nb_samples > 10 × nb_features × nb_classes`  
   (with `nb_classes = 5` for regression).

2. **Parameter-Sample Ratio:**  
   `nb_parameters < nb_samples / 10`  
   (or even `nb_samples / 50` for deep learning).

---

### **GPU Performance**

#### **tf.keras**  
- Run 1: **221s** (~256ms/step) — loss: `4.6651`, accuracy: `0.0858`
- Run 2 (with Apple Silicon M2 GPU): **47s** (~53ms/step) - loss: `4.8064` - accuracy: `0.0620`

#### **Keras**  
- Run 1: **225s** (~263ms/step) — loss: `4.7491`, accuracy: `0.0738`  
- Run 2 (with Apple Silicon M2 GPU): **47s** (~53ms/step) - loss: `4.6743` - accuracy: `0.0761`

---

### **CPU Performance**

#### **tf.keras**  
- Run 1: **1440s** (~1841ms/step) — accuracy: `0.0346`, loss: `5.2095`  
- Run 2 (with Apple Silicon M2 CPU): **541s** (~679ms/step) — accuracy: `0.0534`, loss: `4.9699`  

#### **Keras**  
- Run 1: **1440s** (~1841ms/step) — accuracy: `0.0210`, loss: `5.6996`  
- Run 2 (with Apple Silicon M2 CPU): **535s** (~671ms/step) — accuracy: `0.0534`, loss: `5.0862`


---

### **Key Takeaways**
- GPUs offer a **6x speed improvement** compared to CPUs for deep CNN training.
- CPU training becomes impractically slow for large models and datasets.
- GPU is highly recommended for deep learning tasks, especially for very deep architectures like ResNet.


In [None]:
import tensorflow as tf

devices = tf.config.list_physical_devices()
print("\nDevices: ", devices)

gpus = tf.config.list_physical_devices("GPU")
if gpus:
    details = tf.config.experimental.get_device_details(gpus[0])
    print("GPU details: ", details)

In [None]:
# get data
from tensorflow import keras

cifar = keras.datasets.cifar100
(x_train, y_train), (x_test, y_test) = cifar.load_data()
print(x_train.shape, x_train[:2], y_train.shape, y_train[:2])
print(
    f"{x_train.shape[0]} train samples and {y_test.shape[0]} tests samples, for a total of {x_train.shape[0] + x_test.shape[0]} samples."
)

In [None]:
import tensorflow as tf

model = tf.keras.applications.ResNet50(
    include_top=True,
    weights=None,
    input_shape=(32, 32, 3),
    classes=100,
)
model.summary()

loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False)

model.compile(optimizer="adam", loss=loss_fn, metrics=["accuracy"])

model.fit(x_train, y_train, epochs=1, batch_size=64)

In [None]:
import keras

model = keras.applications.ResNet50(
    include_top=True,
    weights=None,
    input_shape=(32, 32, 3),
    classes=100,
)
model.summary()

loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=False)

model.compile(optimizer="adam", loss=loss_fn, metrics=["accuracy"])

model.fit(x_train, y_train, epochs=1, batch_size=64)