# Cricket Image Classification
## Grid-Based Object Detection using Hand-Crafted Features

**Team Name: Team_Pravin**

This notebook demonstrates the complete pipeline for cricket object detection in images using an 8×8 grid classification approach.

### Problem Description
- Divide each 800×600 image into an 8×8 grid (64 cells of 100×75 pixels each)
- Classify each cell as:
  - 0: No object
  - 1: Ball
  - 2: Bat
  - 3: Stump
- Use only hand-crafted features (NO CNNs)

In [None]:
# Import required libraries
import os
import sys
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt

# Add src to path
sys.path.insert(0, os.getcwd())

# Import project modules
from src.preprocess import (
    load_and_preprocess_image, divide_into_grid, visualize_grid,
    TARGET_WIDTH, TARGET_HEIGHT, GRID_ROWS, GRID_COLS, CELL_WIDTH, CELL_HEIGHT
)
from src.features import (
    extract_cell_features, extract_features_from_grid, 
    extract_hog_features, extract_color_histogram
)
from src.model import GridClassifier, RandomForestSimple, evaluate_model, LABEL_NAMES
from src.utils import (
    save_predictions_to_csv, load_annotations, save_annotations,
    summarize_predictions, count_images
)

print("Modules imported successfully!")

## 1. Dataset Overview

### Requirements:
- At least 300 images with 4:3 aspect ratio
- Resolution: 800 × 600 pixels
- Categories: ball, bat, stump, no_object

### Directory Structure:
```
data/
├── train/
│   ├── ball/
│   ├── bat/
│   ├── stump/
│   └── no_object/
├── test/
│   ├── ball/
│   ├── bat/
│   ├── stump/
│   └── no_object/
└── annotations/
    └── annotations.json
```

In [None]:
# Check dataset statistics
BASE_DIR = os.getcwd()
counts = count_images(BASE_DIR)

print("Dataset Statistics:")
print("=" * 40)
total = 0
for key, count in counts.items():
    print(f"{key}: {count} images")
    total += count
print("=" * 40)
print(f"Total: {total} images")

## 2. Image Preprocessing

Each image is:
1. Validated for resolution (minimum 800×600)
2. Cropped to 4:3 aspect ratio
3. Resized to 800×600 pixels
4. Divided into 8×8 grid (64 cells of 100×75 pixels)

In [None]:
# Demonstrate grid division with a sample image
print(f"Image dimensions: {TARGET_WIDTH} × {TARGET_HEIGHT}")
print(f"Grid size: {GRID_ROWS} × {GRID_COLS} = {GRID_ROWS * GRID_COLS} cells")
print(f"Cell dimensions: {CELL_WIDTH} × {CELL_HEIGHT} pixels")

# Create a sample image for demonstration
sample_image = np.random.randint(100, 200, (TARGET_HEIGHT, TARGET_WIDTH, 3), dtype=np.uint8)

# Divide into grid
cells = divide_into_grid(sample_image)
print(f"\nNumber of cells: {len(cells)}")
print(f"Cell shape: {cells[0].shape}")

In [None]:
# Visualize grid
grid_image = visualize_grid(sample_image)

fig, axes = plt.subplots(1, 2, figsize=(14, 6))
axes[0].imshow(sample_image)
axes[0].set_title('Original Image')
axes[0].axis('off')

axes[1].imshow(grid_image)
axes[1].set_title('Image with 8×8 Grid Overlay')
axes[1].axis('off')

plt.tight_layout()
plt.show()

# Show cell numbering
print("\nCell Numbering (1-64, row-major order):")
print("=" * 50)
for row in range(GRID_ROWS):
    cells_row = [f"c{row * GRID_COLS + col + 1:02d}" for col in range(GRID_COLS)]
    print(" | ".join(cells_row))
print("=" * 50)

## 3. Hand-Crafted Feature Extraction

For each cell, we extract:
1. **HOG Features** (9 dimensions) - Captures edge orientations
2. **Color Histogram** (48 dimensions) - RGB color distribution
3. **Edge Features** (7 dimensions) - Gradient statistics
4. **LBP Features** (256 dimensions) - Local texture patterns
5. **Shape Features** (6 dimensions) - Edge distribution characteristics
6. **Color Statistics** (18 dimensions) - Mean, std, min, max per channel

**Total: 344 features per cell**

In [None]:
# Extract features from a sample cell
sample_cell = cells[0]
features = extract_cell_features(sample_cell)

print(f"Feature vector dimension: {len(features)}")
print(f"\nFeature breakdown:")
print(f"  - HOG: 9 features")
print(f"  - Color Histogram: 48 features")
print(f"  - Edge Features: 7 features")
print(f"  - LBP Features: 256 features")
print(f"  - Shape Features: 6 features")
print(f"  - Color Statistics: 18 features")
print(f"  - Total: {9 + 48 + 7 + 256 + 6 + 18} features")

In [None]:
# Visualize some features
from src.features import extract_hog_features, extract_color_histogram

hog = extract_hog_features(sample_cell)
color_hist = extract_color_histogram(sample_cell)

fig, axes = plt.subplots(1, 2, figsize=(12, 4))

# HOG histogram
axes[0].bar(range(len(hog)), hog)
axes[0].set_xlabel('Orientation Bin')
axes[0].set_ylabel('Normalized Magnitude')
axes[0].set_title('HOG Features (9 orientation bins)')

# Color histogram
colors = ['red'] * 16 + ['green'] * 16 + ['blue'] * 16
axes[1].bar(range(len(color_hist)), color_hist, color=colors)
axes[1].set_xlabel('Color Bin')
axes[1].set_ylabel('Normalized Frequency')
axes[1].set_title('Color Histogram (16 bins × 3 channels)')

plt.tight_layout()
plt.show()

## 4. Model Training

Two classifiers are implemented:
1. **GridClassifier** - Softmax regression with L2 regularization
2. **RandomForestSimple** - Simple random forest implementation

Both use the same hand-crafted features.

In [None]:
# Demonstrate classifier training with synthetic data
np.random.seed(42)

# Generate synthetic training data
n_samples = 400  # 100 per class
n_features = 344

X_train = np.random.randn(n_samples, n_features)
y_train = np.repeat([0, 1, 2, 3], n_samples // 4)

# Add some class-specific patterns
for i in range(n_samples):
    label = y_train[i]
    X_train[i, label * 10:(label + 1) * 10] += 2  # Class-specific signal

print(f"Training data shape: {X_train.shape}")
print(f"Labels shape: {y_train.shape}")
print(f"Class distribution: {np.bincount(y_train)}")

In [None]:
# Train GridClassifier
print("Training GridClassifier...")
clf = GridClassifier(learning_rate=0.1, n_iterations=500, regularization=0.01)
clf.fit(X_train, y_train, verbose=True)

In [None]:
# Evaluate on training data
y_pred = clf.predict(X_train)
metrics = evaluate_model(y_train, y_pred)

print("\nTraining Metrics:")
print("=" * 50)
for key, value in metrics.items():
    print(f"{key}: {value:.4f}")

In [None]:
# Save the model
model_path = 'models/model_Team_Pravin.pkl'
clf.save(model_path)

# Load and verify
clf_loaded = GridClassifier.load(model_path)
y_pred_loaded = clf_loaded.predict(X_train)

print(f"\nPredictions match after save/load: {np.array_equal(y_pred, y_pred_loaded)}")

## 5. Output Format

The CSV output format is:
```
ImageFileName, TrainOrTest, c01, c02, ..., c64
```

Where each cell value is:
- 0: No object
- 1: Ball
- 2: Bat
- 3: Stump

In [None]:
# Create sample predictions for demonstration
sample_predictions = {
    'train/ball/sample_ball.jpg': (
        'train',
        [0]*34 + [1] + [0]*29  # Ball in cell 35
    ),
    'train/bat/sample_bat.jpg': (
        'train',
        [0]*19 + [2] + [0]*33 + [2] + [0]*8 + [2] + [0]  # Bat in cells 20, 54, 63
    ),
    'test/stump/sample_stump.jpg': (
        'test',
        [0]*27 + [3] + [0]*7 + [3] + [0]*27  # Stumps in cells 28, 36
    ),
}

# Save to CSV
output_csv = 'outputs/predictions.csv'
save_predictions_to_csv(sample_predictions, output_csv)

# Show summary
summarize_predictions(sample_predictions)

In [None]:
# Display the CSV content
print("CSV Output:")
print("=" * 100)
with open(output_csv, 'r') as f:
    for line in f:
        print(line.strip())

## 6. Full Pipeline

To run the complete pipeline:

```bash
# 1. Add images to the data/ directories

# 2. Create annotation template
python main.py --mode annotate

# 3. Manually edit data/annotations/annotations.json

# 4. Train the model
python main.py --mode train

# 5. Generate predictions
python main.py --mode predict

# 6. Evaluate on test data
python main.py --mode evaluate
```

## Summary

This project implements a grid-based cricket object detection system using:

1. **Image Preprocessing**: Resize to 800×600, divide into 8×8 grid
2. **Hand-Crafted Features**: HOG, color histograms, edge features, LBP, shape features
3. **Classification**: Softmax regression or Random Forest
4. **Output**: CSV with cell-level predictions

**Key Constraints Satisfied**:
- ✓ Uses hand-crafted features (no CNNs)
- ✓ Images resized to 800×600 (4:3 aspect ratio)
- ✓ 8×8 grid division (64 cells)
- ✓ 4-class classification (no object, ball, bat, stump)
- ✓ Model saved as model_Team_Pravin.pkl
- ✓ Predictions in required CSV format