<a href="https://colab.research.google.com/github/rishi-latchmepersad/tensorflow_tutorials/blob/main/getting_started_with_keras.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [4]:
import os
os.environ["KERAS_BACKEND"] = "tensorflow"
import keras
import numpy as np
from keras import layers as L
from keras import ops
!pip install --upgrade keras
!pip install --upgrade keras-cv
!pip install --upgrade keras-hub
!pip install --upgrade keras



In [3]:
# the mnist dataset is a large dataset of handwritten digits from 0-9, commonly used to evaluate computer vision tasks
# we first load the data and split it between train and test sets
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# normalize the images to the [0, 1] range
x_train = x_train.astype("float32") / 255
x_test = x_test.astype("float32") / 255
# the mnist dataset provides a set of 28x28 greyscale images, but it doesn't explicitly set the number of channels to be 1 (greyscale)
print("x_train shape:", x_train.shape)
print("x_test shape:", x_test.shape)
# so we add the extra dimension to the end for use in the later convolution layers
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)
print("x_train shape:", x_train.shape)
print("y_train shape:", x_test.shape)
print(x_train.shape[0], "train samples")
print(x_test.shape[0], "test samples")

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step
x_train shape: (60000, 28, 28)
x_test shape: (10000, 28, 28)
x_train shape: (60000, 28, 28, 1)
y_train shape: (10000, 28, 28, 1)
60000 train samples
10000 test samples


# Core Deep Learning Model Families (cheat sheet)

## MLP / Feed-Forward
- **Use for:** tabular data, small classifiers, quick baselines  
- **Idea:** dense layers on flattened inputs  
- **Keras layers:** `Dense`, `Dropout`  
- **TinyML notes:** very compact, but loses spatial or temporal structure

## CNNs

### 2D CNN
- **Use for:** images, small vision tasks like digits or gauges  
- **Idea:** local receptive fields with shared weights  
- **Keras layers:** `Conv2D`, `DepthwiseConv2D`, `MaxPool2D`, `GlobalAveragePooling2D`  
- **TinyML notes:** the workhorse with strong TFLite Micro support

### 1D CNN
- **Use for:** time series, sensor streams, audio features  
- **Idea:** same as 2D but along a single axis  
- **Keras layers:** `Conv1D`, `GlobalAveragePooling1D`  
- **TinyML notes:** fast and memory friendly for forecasting

### Depthwise-separable CNN
- **Use for:** mobile or MCU efficiency with good accuracy  
- **Idea:** depthwise conv per channel, then pointwise `1×1`  
- **Keras layers:** `DepthwiseConv2D`, `Conv2D` with `kernel_size=1`  
- **TinyML notes:** big savings in parameters and MACs

## Residual CNNs and Inverted Residuals
- **Use for:** deeper small models that train stably  
- **Idea:** skip connections, MobileNetV2 style bottlenecks  
- **Keras layers:** `Add` plus the usual convs  
- **TinyML notes:** small extra RAM for the skip, usually worth it

## TCN, Temporal Convolutional Networks
- **Use for:** forecasting and sequence modeling without recurrence  
- **Idea:** dilated causal `Conv1D` for long context  
- **Keras layers:** `Conv1D` with `dilation_rate`, `padding="causal"`  
- **TinyML notes:** only conv ops, very deployment friendly

## RNN / LSTM / GRU
- **Use for:** sequences where order and long memory matter  
- **Idea:** recurrent state flows step to step  
- **Keras layers:** `SimpleRNN`, `LSTM`, `GRU`  
- **TinyML notes:** heavier than Conv1D or TCN; TFLite support exists, TFLite Micro is tighter

## Transformers
- **Use for:** language, ViT, some multimodal tasks  
- **Idea:** self-attention mixes all positions  
- **Keras layers:** `MultiHeadAttention`, `LayerNormalization`, `Dense`  
- **TinyML notes:** memory hungry; tiny variants exist but are harder on MCUs

## Autoencoders (AE) and Variational AEs
- **Use for:** compression, denoising, anomaly detection  
- **Idea:** encode to a latent, reconstruct the input  
- **Keras layers:** same CNN or MLP blocks plus a reconstruction head  
- **TinyML notes:** small AEs are fine; VAEs add stochastic parts and extra cost

## GANs
- **Use for:** data synthesis, augmentation  
- **Idea:** generator versus discriminator  
- **TinyML notes:** training off device; inference can be heavy

## Diffusion Models
- **Use for:** high quality generative images or audio  
- **Idea:** iterative denoising from noise  
- **TinyML notes:** far too heavy for microcontrollers

## Graph Neural Networks (GNNs)
- **Use for:** graphs, molecules, road networks  
- **Idea:** message passing over nodes and edges  
- **TinyML notes:** niche and often requires custom ops

---

In [8]:
# now we build a model to predict the digit using the keras functional API
def build_functional_model(input_shape, num_classes):
    """
    Build a small VGG-style CNN using the Functional API.
    input_shape: tuple (H, W, C), for MNIST often (28, 28, 1)
    num_classes: number of output classes for classification (often 10 for MNIST)
    """
    inputs = keras.Input(shape=input_shape)         # Define the symbolic input tensor for the computation graph

    # First convolutional block: two 3×3 convs
    # Note: padding='valid' is the default and shrinks H and W by 2 each conv.
    # If you want to keep spatial size, set padding='same'.
    x = L.Conv2D(64, (3, 3), activation="relu")(inputs)   # Extract local features with 64 filters, add nonlinearity
    x = L.Conv2D(64, (3, 3), activation="relu")(x)        # Stack another 3×3 to expand receptive field with modest params
    x = L.MaxPooling2D((2, 2))(x)                         # Downsample by 2, reduce compute, gain some translation invariance

    # Second convolutional block: increase channel depth as resolution drops
    x = L.Conv2D(128, (3, 3), activation="relu")(x)       # Learn richer features at lower spatial resolution
    x = L.Conv2D(128, (3, 3), activation="relu")(x)       # Another 3×3 for more expressive power without huge kernels

    # Classifier head: make features compact, regularize, then classify
    x = L.GlobalAveragePooling2D()(x)                     # Average each feature map over H and W, get a 128-D vector; avoids large Dense layers
    x = L.Dropout(0.5)(x)                                 # Randomly drop activations during training, reduce overfitting; inactive at inference

    outputs = L.Dense(num_classes, activation="softmax")(x)  # Map to class probabilities for single-label multiclass tasks

    return keras.Model(inputs=inputs,                      # Assemble inputs and outputs into a Model object
                       outputs=outputs,
                       name="mnist_predict_digits")


In [10]:
# call the function to build the model. we use 10 classes since we have digits 0-9
model = build_functional_model(input_shape=x_train.shape[1:], num_classes=10)
model.summary()
# keras.utils.plot_model(model, "my_first_model_with_shape_info.png", show_shapes=True)