# Chapter 5 â€” Labeling and Annotation Strategies

In this notebook, we explore how labeling decisions affect model performance.


In [3]:
# Simulated Manual Labeling Example

# Create fake labeled dataset:

import pandas as pd
import numpy as np

np.random.seed(42)

image_ids = [f"img_{i}.jpg" for i in range(50)]
labels = np.random.choice(["cat", "dog"], size=50)

df = pd.DataFrame({
    "filename": image_ids,
    "label": labels
})

df.head()

# Demonstrate Label Noise

df.loc[0:3, "label"] = "dog"  # introduce noise
df["label"].value_counts()


label
dog    30
cat    20
Name: count, dtype: int64

Note about label noise: Even small labeling errors can degrade model quality.


In [4]:
# Multi-Label Example

df_multi = pd.DataFrame({
    "filename": image_ids,
    "is_cat": np.random.randint(0,2,50),
    "is_outdoor": np.random.randint(0,2,50)
})

df_multi.head()


Unnamed: 0,filename,is_cat,is_outdoor
0,img_0.jpg,0,0
1,img_1.jpg,1,1
2,img_2.jpg,0,1
3,img_3.jpg,1,1
4,img_4.jpg,0,1


# Discussion:
- Explain multi-label vs multi-class.
- Bounding boxes (JSON format)
- Segmentation masks (pixel-wise labels)