# CAD Rotation Model – Pre‑processing Notebook

Reusable utilities to prepare annotation *batches* for training a rotation‑aware detection model.
The notebook is idempotent: you can re‑run after dropping in new batches.


## README / Quick‑Start

**Directory layout expected**

```
rotation/
└── batches/
    ├── batch_20250115_01/          # <- renamed input folder
    │   ├── images/
    │   │   └── default/*.png
    │   └── annotations/
    │       └── instances_default.json
    └── ...
```

> ⚠️ If your raw data are still in `rotation/batches/images/default`  
> run section **1 – Rename batches** first.

**Requirements**

```bash
pip install pandas matplotlib pillow
```

The notebook runs entirely offline.

**Lifecycle**

1. **Rename batches** – give each batch a stable, informative folder name.  
2. **Explore JSON** – get counts of images, categories, rotated boxes.  
3. **Visual check** – overlay polygons / bboxes on random images.  
4. **(Optional) Export tidy CSV** – for downstream pipelines.


In [None]:

import os, json, shutil, random, math, datetime as dt
from pathlib import Path
from typing import List, Dict, Any
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle, Polygon

plt.rcParams['figure.dpi'] = 140  # sharper inline figs


In [None]:

# ⇩⇩ Adjust these two lines to match your project root  ⇩⇩
ROOT = Path.cwd()               # or Path('/absolute/path/to/project')
BATCHES_DIR = ROOT/'rotation'/'batches'

print('Project root:', ROOT)
print('Batches dir  :', BATCHES_DIR)

assert BATCHES_DIR.exists(), 'Path does not exist – please fix ROOT'


## 1 – Rename batches

Raw exports often land in a single folder. The helper below **moves** every `images/default` sibling into a new folder called `batch_<YYYYMMDD>_<nn>`.

Feel free to adapt the scheme.

In [None]:

def rename_batches(batches_dir: Path, prefix: str = 'batch', date_fmt: str = '%Y%m%d') -> None:
    """Rename anonymous batch folders to a canonical pattern.

    Args
    ----
    batches_dir : Path
        The directory that currently holds `images/default` and `annotations/`
        or a flat list of unnamed batches.
    prefix : str
        Prefix used in the target folder name, defaults to 'batch'.
    date_fmt : str
        Date stamp to attach. Default '%%Y%%m%%d'.

    Effect
    ------
    Creates new directory `batches_dir/<prefix>_<date>_<index:02d>` and moves
    `images` and `annotations` inside.
    """    
    today = dt.datetime.today().strftime(date_fmt)
    index = 1
    unnamed = [p for p in batches_dir.iterdir() if p.is_dir() and p.name not in ('images','annotations')]
    # also handle loose images/annotations sitting directly
    if (batches_dir/'images').exists() and (batches_dir/'annotations').exists():
        unnamed.append(batches_dir)
    if not unnamed:
        print('Nothing to rename – folders already structured ✔️')
        return
    for src in unnamed:
        target = batches_dir/f"{prefix}_{today}_{index:02d}"
        index += 1
        target.mkdir(exist_ok=True)
        for sub in ('images', 'annotations'):
            sub_path = src/sub
            if sub_path.exists():
                shutil.move(str(sub_path), target/ sub)
        # remove empty src folder if it wasn't batches_dir
        if src != batches_dir:
            try:
                src.rmdir()
            except OSError:
                pass
        print(f"Moved {src} -> {target}")


In [None]:

# -- Preview rename without executing (dry run) -------------------------------
# Uncomment to execute
# rename_batches(BATCHES_DIR)


## 2 – Explore a COCO JSON file

In [None]:

def load_coco(json_path: Path) -> Dict[str, Any]:
    with open(json_path, 'r', encoding='utf-8') as f:
        coco = json.load(f)
    return coco

def coco_summary(coco: Dict[str, Any]) -> None:
    print(f"Images      : {len(coco['images']):>5}")
    print(f"Annotations : {len(coco['annotations']):>5}")
    print(f"Categories  : {len(coco['categories']):>5}\n")
    cat_map = {c['id']: c['name'] for c in coco['categories']}
    counts = {}
    for ann in coco['annotations']:
        counts[cat_map[ann['category_id']]] = counts.get(cat_map[ann['category_id']], 0) + 1
    print('Top classes:')
    for k, v in sorted(counts.items(), key=lambda kv: kv[1], reverse=True)[:10]:
        print(f"  {k:<25} {v}")

def coco_to_df(coco: Dict[str, Any]) -> pd.DataFrame:
    img_lookup = {img['id']: img for img in coco['images']}
    rows = []
    for ann in coco['annotations']:
        img = img_lookup[ann['image_id']]
        row = {
            'image_id': ann['image_id'],
            'file_name': img['file_name'],
            'width': img['width'],
            'height': img['height'],
            'category_id': ann['category_id'],
            'bbox': ann['bbox'],
            'area': ann.get('area', None),
            'rotation': ann.get('attributes', {}).get('rotation', 0.0),
            'iscrowd': ann.get('iscrowd', 0)
        }
        rows.append(row)
    return pd.DataFrame(rows)


In [None]:

# pick the first batch
first_batch = next(sorted(BATCHES_DIR.iterdir()))
json_path = first_batch/'annotations'/'instances_default.json'
coco = load_coco(json_path)
coco_summary(coco)

df = coco_to_df(coco)
df.head()


## 3 – Visual sanity check

In [None]:

def show_image_with_annotations(image_row: pd.Series, anns: pd.DataFrame, images_dir: Path,
                                show_bbox=True, show_seg=False, alpha=0.3) -> None:
    """Display single image with its rotated bboxes (approx. via matplotlib.transforms)."""
    img_path = images_dir / image_row['file_name']
    if not img_path.exists():
        print('Image not found:', img_path)
        return
    img = plt.imread(img_path)

    fig, ax = plt.subplots(figsize=(6,6))
    ax.imshow(img)
    ax.axis('off')

    subset = anns[anns['image_id'] == image_row['image_id']]
    for _, ann in subset.iterrows():
        x, y, w, h = ann['bbox']
        rot = ann['rotation']
        rect = Rectangle((x, y), w, h, angle=-rot, 
                         linewidth=1.2, fill=False)
        ax.add_patch(rect)
        ax.text(x, y, str(ann['category_id']), fontsize=6, color='yellow')
    plt.show()

# Example: pick random image
sample_row = df.sample(1, random_state=42).iloc[0]
show_image_with_annotations(sample_row, df, first_batch/'images'/'default')


## 4 – Debugging tips and next steps
- Add `assert` statements after every transformation.
- Use `df.query()` to inspect edge cases (e.g. large rotation angles).
- When overlays look wrong, print the raw `bbox` and `rotation` for that id.
- Consider writing unit tests with `pytest` if the pipeline will grow.