# Building a Multi-Output Detection Model
- Build a model with multiple outputs
- Combine regression and classification loss
- Train a simplified detection model
- Understand multi-task learning

## Why Build a Simplified Model?
- Modern detectors (e.g., RetinaNet, YOLO) are complex.
- Here, we build a simplified version to understand: Backbone CNN → Classification Head → Bounding Box Regression Head


In [3]:
import tensorflow as tf
import tensorflow_datasets as tfds

tf.random.set_seed(42)

dataset, info = tfds.load(
    "voc/2007",
    split="train",
    with_info=True
)

num_classes = info.features["objects"]["label"].num_classes
class_names = info.features["objects"]["label"].names

print("Dataset loaded.")
print("Number of classes:", num_classes)

# Create Small Subset

small_ds = dataset.take(500)

# Preprocessing
# We simplify to resize image to 224x224, keep only ONE object per image (for clarity)

def preprocess(sample):
    image = tf.image.resize(sample["image"], (224,224))
    image = image / 255.0

    box = sample["objects"]["bbox"][0]
    label = sample["objects"]["label"][0]

    return image, {
        "class_output": tf.one_hot(label, num_classes),
        "box_output": box
    }

train_ds = small_ds.map(preprocess).batch(16)


Dataset loaded.
Number of classes: 20


In [4]:
# Build Model

base_model = tf.keras.applications.MobileNetV2(
    input_shape=(224,224,3),
    include_top=False,
    weights="imagenet"
)

base_model.trainable = False

x = tf.keras.layers.GlobalAveragePooling2D()(base_model.output)

class_output = tf.keras.layers.Dense(
    num_classes,
    activation="softmax",
    name="class_output"
)(x)

box_output = tf.keras.layers.Dense(
    4,
    name="box_output"
)(x)

model = tf.keras.Model(
    inputs=base_model.input,
    outputs=[class_output, box_output]
)


In [5]:
# Compile

model.compile(
    optimizer="adam",
    loss={
        "class_output": "categorical_crossentropy",
        "box_output": "mse"
    },
    metrics={
        "class_output": "accuracy"
    }
)


In [6]:
# Train (CPU-safe)

history = model.fit(train_ds, epochs=3)


Epoch 1/3
Epoch 2/3
Epoch 3/3


# Discussion
- Why Use MSE for Boxes? Bounding box prediction is a regression task.
- Why Separate Losses? Detection is a multi-task learning problem:
  - Task 1: Classification
  - Task 2: Localization

# Reflection Questions
- Why might this model perform poorly compared to SSD?
- What happens if we unfreeze the backbone?
- Why is predicting multiple objects harder?