# Phase 1 Batch Processing: Mathematical Slicing

This notebook implements the Phase 1 pipeline, which takes the raw scrambled puzzle images and segments them into individual pieces using Mathematical Slicing.

## Method
- **Input**: `Jigsaw Puzzle Dataset/Gravity Falls`
- **Logic**: Derives the grid size $N$ from the folder name (e.g. `puzzle_4x4` -> $N=4$). Slices the image into an exact $N \times N$ grid.
- **Output**: `phase1_batch_output/`

## Key Refinement
Previously, we used projection profiles to detect lines. This was replaced by strict mathematical slicing because the puzzle images are synthesized digitally without gaps, so exact division is more robust and faster.

In [1]:
import cv2
import numpy as np
import os
import glob
import shutil

In [2]:
def slice_mathematically(image_path, output_dir, N):
    """
    Slices the image into NxN grid mathematically.
    Assumes image is perfectly croppable or handles border pixels.
    """
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)
    
    # 1. Load Image
    original = cv2.imread(image_path)
    if original is None:
        print(f"Error: Image not found at {image_path}")
        return
    
    h, w = original.shape[:2]
    
    # Calculate step sizes
    # We use floating point division for calculation but indices must be integers
    # To avoid rounding errors skipping pixels, we generate exact float boundaries
    step_x = w / N
    step_y = h / N
    
    piece_idx = 0
    
    # Generate pieces
    for i in range(N):
        for j in range(N):
            # Calculate coordinates
            y1 = int(i * step_y)
            y2 = int((i + 1) * step_y)
            x1 = int(j * step_x)
            x2 = int((j + 1) * step_x)
            
            # Ensure we reach the end exactly on the last piece
            if i == N - 1: y2 = h
            if j == N - 1: x2 = w
            
            piece = original[y1:y2, x1:x2]
            
            cv2.imwrite(f"{output_dir}/piece_{piece_idx}.jpg", piece)
            piece_idx += 1
    
    # Visualization of the grid
    mask_vis = original.copy()
    # Draw vertical lines
    for j in range(1, N):
        x = int(j * step_x)
        cv2.line(mask_vis, (x, 0), (x, h), (0, 0, 255), 2)
    # Draw horizontal lines
    for i in range(1, N):
        y = int(i * step_y)
        cv2.line(mask_vis, (0, y), (w, y), (0, 0, 255), 2)
        
    cv2.imwrite(f"{output_dir}/detected_grid.jpg", mask_vis)

In [3]:
def process_dataset(root_dir, output_root):
    """
    Walks through the dataset and processes all jpg images using mathematical slicing.
    """
    print(f"Starting mathematical slicing batch processing from {root_dir}...")
    
    # Categories: puzzle_2x2, puzzle_4x4, puzzle_8x8
    # Map category to grid size N
    categories = {
        "puzzle_2x2": 2,
        "puzzle_4x4": 4,
        "puzzle_8x8": 8
    }
    
    total_processed = 0
    
    for cat, N in categories.items():
        cat_path = os.path.join(root_dir, cat)
        if not os.path.exists(cat_path):
            print(f"Skipping {cat}, directory not found.")
            continue
            
        print(f"Processing category: {cat} with Grid Size N={N}")
        images = glob.glob(os.path.join(cat_path, "*.jpg"))
        
        for img_path in images:
            img_name = os.path.splitext(os.path.basename(img_path))[0]
            target_dir = os.path.join(output_root, cat, img_name)
            
            try:
                # Use mathematical slicing instead of projection solver
                slice_mathematically(img_path, target_dir, N)
                total_processed += 1
                if total_processed % 50 == 0:
                    print(f"Processed {total_processed} images...")
            except Exception as e:
                print(f"Failed to process {img_path}: {e}")

    print(f"Batch processing complete. {total_processed} images processed.")

In [4]:
if __name__ == "__main__":
    dataset_root = r"Jigsaw Puzzle Dataset/Gravity Falls"
    output_root = "phase1_batch_output"
    
    if os.path.exists(dataset_root):
        process_dataset(dataset_root, output_root)
    else:
        print(f"Dataset root not found: {dataset_root}")

Starting mathematical slicing batch processing from Jigsaw Puzzle Dataset/Gravity Falls...
Processing category: puzzle_2x2 with Grid Size N=2
Processed 50 images...
Processed 100 images...
Processing category: puzzle_4x4 with Grid Size N=4
Processed 150 images...
Processed 200 images...
Processing category: puzzle_8x8 with Grid Size N=8
Processed 250 images...
Processed 300 images...
Batch processing complete. 330 images processed.
