# License Plate Training Setup and Dataset

This notebook handles environment verification and Indonesian license plate dataset acquisition for YOLOv8 training.

## Tasks:
- [x] Verify GPU availability
- [ ] Check CUDA version compatibility
- [ ] Download Indonesian license plate datasets
- [ ] Verify dataset integrity
- [ ] Initial dataset statistics

## 1. Environment Verification

In [12]:
%pip install seaborn

import sys
import torch
import torchvision
import ultralytics
import cv2
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import os
from datetime import datetime

print("Environment Check:")
print(f"Python version: {sys.version}")
print(f"PyTorch version: {torch.__version__}")
print(f"Torchvision version: {torchvision.__version__}")
print(f"Ultralytics version: {ultralytics.__version__}")
print(f"OpenCV version: {cv2.__version__}")
print(f"NumPy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")

Note: you may need to restart the kernel to use updated packages.
Environment Check:
Python version: 3.13.2 (tags/v3.13.2:4f8bb39, Feb  4 2025, 15:23:48) [MSC v.1942 64 bit (AMD64)]
PyTorch version: 2.7.1+cpu
Torchvision version: 0.22.1+cpu
Ultralytics version: 8.3.173
OpenCV version: 4.12.0
NumPy version: 2.2.6
Pandas version: 2.3.1


In [13]:
# GPU and CUDA availability check
print("GPU/CUDA Information:")
print(f"CUDA available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"GPU count: {torch.cuda.device_count()}")
    for i in range(torch.cuda.device_count()):
        print(f"GPU {i}: {torch.cuda.get_device_name(i)}")
        print(f"  Memory: {torch.cuda.get_device_properties(i).total_memory / 1e9:.1f} GB")
else:
    print("CUDA version: N/A (CPU-only training)")
    print("Warning: Training will be significantly slower on CPU")
    print("Consider installing CUDA-enabled PyTorch for GPU acceleration")

GPU/CUDA Information:
CUDA available: False
CUDA version: N/A (CPU-only training)
Consider installing CUDA-enabled PyTorch for GPU acceleration


## 2. Directory Structure Verification

In [14]:
# Verify project directory structure
base_dir = Path("..")
required_dirs = [
    "notebooks", "scripts", "dataset", "models", "configs", "results",
    "dataset/raw", "dataset/processed", "dataset/train", "dataset/val", "dataset/test",
    "dataset/train/images", "dataset/train/labels",
    "dataset/val/images", "dataset/val/labels",
    "dataset/test/images", "dataset/test/labels",
    "models/experiments", "models/checkpoints", "models/final",
    "results/metrics", "results/plots", "results/reports"
]

print("Directory Structure Check:")
missing_dirs = []
for dir_name in required_dirs:
    dir_path = base_dir / dir_name
    if dir_path.exists():
        print(f"✓ {dir_name}")
    else:
        print(f"✗ {dir_name} - MISSING")
        missing_dirs.append(dir_name)

if missing_dirs:
    print(f"\nMissing directories: {len(missing_dirs)}")
    print("Create missing directories manually or re-run setup script")
else:
    print("\n✓ All required directories exist")

Directory Structure Check:
✗ notebooks - MISSING
✗ scripts - MISSING
✗ dataset - MISSING
✗ models - MISSING
✗ configs - MISSING
✗ results - MISSING
✗ dataset/raw - MISSING
✗ dataset/processed - MISSING
✗ dataset/train - MISSING
✗ dataset/val - MISSING
✗ dataset/test - MISSING
✗ dataset/train/images - MISSING
✗ dataset/train/labels - MISSING
✗ dataset/val/images - MISSING
✗ dataset/val/labels - MISSING
✗ dataset/test/images - MISSING
✗ dataset/test/labels - MISSING
✗ models/experiments - MISSING
✗ models/checkpoints - MISSING
✗ models/final - MISSING
✗ results/metrics - MISSING
✗ results/plots - MISSING
✗ results/reports - MISSING

Missing directories: 23
Create missing directories manually or re-run setup script


## 3. Install Additional Dependencies

In [15]:
# Install roboflow for dataset download
!pip install roboflow

Collecting opencv-python-headless==4.10.0.84 (from roboflow)
  Using cached opencv_python_headless-4.10.0.84-cp37-abi3-win_amd64.whl.metadata (20 kB)
Using cached opencv_python_headless-4.10.0.84-cp37-abi3-win_amd64.whl (38.8 MB)
Installing collected packages: opencv-python-headless


ERROR: Could not install packages due to an OSError: [WinError 5] Access is denied: 'C:\\Users\\Rafael Jonathan\\Desktop\\license-plate-training\\venv\\Lib\\site-packages\\cv2\\cv2.pyd'
Check the permissions.



## 4. Dataset Download - Plat Kendaraan (Largest Indonesian License Plate Dataset)

**Dataset Details:**
- **Source:** Roboflow Universe - Plat Kendaraan
- **Images:** 2,167 total (larger dataset for better training)
- **Format:** YOLO TXT with YAML config
- **License:** MIT (Free commercial and personal use)
- **Classes:** 2 (license-plate, vehicle)

In [16]:
# Download Plat Kendaraan dataset with duplicate protection
import roboflow
from pathlib import Path
import os

# Check if dataset already exists
dataset_path = Path("../dataset/raw/plat-kendaraan")

if dataset_path.exists() and (dataset_path / "data.yaml").exists():
    print("✅ Dataset already exists, skipping download")
    print(f"📁 Dataset location: {dataset_path.absolute()}")
    
    # Verify dataset integrity
    yaml_file = dataset_path / "data.yaml"
    if yaml_file.exists():
        print("✅ Configuration file found")
        
        # Check splits
        splits = ['train', 'valid', 'test']
        total_images = 0
        
        for split in splits:
            images_dir = dataset_path / split / 'images'
            labels_dir = dataset_path / split / 'labels'
            
            if images_dir.exists():
                image_count = len(list(images_dir.glob('*.jpg'))) + len(list(images_dir.glob('*.png')))
                label_count = len(list(labels_dir.glob('*.txt'))) if labels_dir.exists() else 0
                total_images += image_count
                print(f"  {split.upper()}: {image_count} images, {label_count} labels")
        
        print(f"✅ Total dataset: {total_images} images ready for training")
    else:
        print("⚠️  Configuration file missing, dataset may be incomplete")
        
else:
    print("📥 Dataset not found, downloading...")
    
    # Download dataset
    roboflow.login()
    rf = roboflow.Roboflow(api_key="uEeeci4Bzrghp40RZfb6")
    project = rf.workspace("plat-kendaraan").project("vehicle-and-license-plate")
    dataset = project.version(1).download("yolov8", location="../dataset/raw/plat-kendaraan/")
    
    print(f"✅ Dataset downloaded to: {dataset.location}")
    
    # Verify download
    if (dataset_path / "data.yaml").exists():
        print("✅ Download verification successful")
    else:
        print("⚠️  Download verification failed - please check dataset")

📥 Dataset not found, downloading...
You are already logged into Roboflow. To make a different login,run roboflow.login(force=True).
loading Roboflow workspace...
loading Roboflow project...


Downloading Dataset Version Zip in ../dataset/raw/plat-kendaraan/ to yolov8:: 100%|██████████| 338253/338253 [00:44<00:00, 7615.46it/s]





Extracting Dataset Version Zip to ../dataset/raw/plat-kendaraan/ in yolov8:: 100%|██████████| 11118/11118 [00:14<00:00, 755.70it/s]


✅ Dataset downloaded to: c:\Users\Rafael Jonathan\Desktop\dataset\raw\plat-kendaraan
✅ Download verification successful


## 5. Alternative: Manual Dataset Download

If you prefer to download manually:

In [17]:
# Alternative datasets to consider:
datasets_info = {
    "Plat Kendaraan (Current Choice)": {
        "url": "https://universe.roboflow.com/plat-kendaraan/vehicle-and-license-plate",
        "images": 2167,
        "license": "MIT",
        "format": "YOLO"
    },
    "KSP WORKSPACE (Alternative)": {
        "url": "https://universe.roboflow.com/ksp-workspace/indonesia-license-plate-iqrtj/dataset/2",
        "images": 1607,
        "license": "CC BY 4.0",
        "format": "YOLO"
    },
    "Kaggle Indonesian LP": {
        "url": "https://www.kaggle.com/datasets/juanthomaswijaya/indonesian-license-plate-dataset",
        "images": 1000,
        "license": "Kaggle",
        "format": "YOLO"
    }
}

print("Available Indonesian License Plate Datasets:")
print("=" * 50)
for name, info in datasets_info.items():
    print(f"\n{name}:")
    print(f"  URL: {info['url']}")
    print(f"  Images: {info['images']}")
    print(f"  License: {info['license']}")
    print(f"  Format: {info['format']}")

Available Indonesian License Plate Datasets:

Plat Kendaraan (Current Choice):
  URL: https://universe.roboflow.com/plat-kendaraan/vehicle-and-license-plate
  Images: 2167
  License: MIT
  Format: YOLO

KSP WORKSPACE (Alternative):
  URL: https://universe.roboflow.com/ksp-workspace/indonesia-license-plate-iqrtj/dataset/2
  Images: 1607
  License: CC BY 4.0
  Format: YOLO

Kaggle Indonesian LP:
  URL: https://www.kaggle.com/datasets/juanthomaswijaya/indonesian-license-plate-dataset
  Images: 1000
  License: Kaggle
  Format: YOLO


## 6. Dataset Verification (Run after download)

In [18]:
# Check if dataset was downloaded
dataset_path = Path("../dataset/raw/plat-kendaraan")

if dataset_path.exists():
    print(f"✅ Dataset found at: {dataset_path.absolute()}")
    print("\nDataset directory structure:")
    
    # Show main files and directories
    for item in sorted(dataset_path.iterdir()):
        if item.is_dir():
            # Count files in subdirectories
            file_count = len(list(item.rglob("*.*")))
            print(f"  📁 {item.name}/ ({file_count} files)")
        else:
            print(f"  📄 {item.name}")
    
    print(f"\n🎯 Dataset ready for data exploration!")
else:
    print("❌ Dataset not found. Please run the download section first.")

✅ Dataset found at: c:\Users\Rafael Jonathan\Desktop\license-plate-training\..\dataset\raw\plat-kendaraan

Dataset directory structure:
  📄 data.yaml
  📄 README.dataset.txt
  📄 README.roboflow.txt
  📁 test/ (42 files)
  📁 train/ (10200 files)
  📁 valid/ (864 files)

🎯 Dataset ready for data exploration!


In [19]:
# Basic dataset statistics (run after download)
def analyze_dataset(dataset_root):
    dataset_root = Path(dataset_root)
    
    if not dataset_root.exists():
        print(f"Dataset path {dataset_root} does not exist")
        return
    
    splits = ['train', 'valid', 'test']
    total_images = 0
    total_labels = 0
    
    print("Dataset Analysis:")
    print("=" * 40)
    
    for split in splits:
        images_dir = dataset_root / split / 'images'
        labels_dir = dataset_root / split / 'labels'
        
        if images_dir.exists():
            image_files = list(images_dir.glob('*.jpg')) + list(images_dir.glob('*.png'))
            label_files = list(labels_dir.glob('*.txt')) if labels_dir.exists() else []
            
            print(f"{split.upper()}:")
            print(f"  Images: {len(image_files)}")
            print(f"  Labels: {len(label_files)}")
            print(f"  Match: {'✅' if len(image_files) == len(label_files) else '⚠️'}")
            
            total_images += len(image_files)
            total_labels += len(label_files)
    
    print(f"\nTOTAL:")
    print(f"  Images: {total_images}")
    print(f"  Labels: {total_labels}")
    print(f"  Overall Match: {'✅' if total_images == total_labels else '⚠️'}")
    
    # Check for data.yaml
    yaml_file = dataset_root / 'data.yaml'
    if yaml_file.exists():
        print(f"\n✅ YAML config found: {yaml_file}")
        try:
            import yaml
            with open(yaml_file, 'r') as f:
                config = yaml.safe_load(f)
            print("YAML contents:")
            for key, value in config.items():
                print(f"  {key}: {value}")
        except Exception as e:
            print(f"⚠️  Could not read YAML: {e}")
    else:
        print(f"\n❌ YAML config not found")

# Run analysis with correct path
analyze_dataset("../dataset/raw/plat-kendaraan")

Dataset Analysis:
TRAIN:
  Images: 5100
  Labels: 5100
  Match: ✅
VALID:
  Images: 432
  Labels: 432
  Match: ✅
TEST:
  Images: 21
  Labels: 21
  Match: ✅

TOTAL:
  Images: 5553
  Labels: 5553
  Overall Match: ✅

✅ YAML config found: ..\dataset\raw\plat-kendaraan\data.yaml
YAML contents:
  names: ['License_Plate']
  nc: 1
  roboflow: {'license': 'MIT', 'project': 'vehicle-and-license-plate', 'url': 'https://universe.roboflow.com/plat-kendaraan/vehicle-and-license-plate/dataset/1', 'version': 1, 'workspace': 'plat-kendaraan'}
  test: ../test/images
  train: ../train/images
  val: ../valid/images


## 7. Test YOLOv8 Installation

In [20]:
from ultralytics import YOLO

# Test YOLO model loading
print("Testing YOLOv8 installation...")
try:
    # Load a pretrained YOLOv8 nano model
    model = YOLO('yolov8n.pt')
    print("✓ YOLOv8 model loaded successfully")
    print(f"✓ Model type: {type(model)}")
    print(f"✓ Model device: {model.device}")
    
    # Test model info
    model.info()
    
except Exception as e:
    print(f"✗ Error loading YOLOv8: {e}")

Testing YOLOv8 installation...
✓ YOLOv8 model loaded successfully
✓ Model type: <class 'ultralytics.models.yolo.model.YOLO'>
✓ Model device: cpu
YOLOv8n summary: 129 layers, 3,157,200 parameters, 0 gradients, 8.9 GFLOPs


## 8. Setup Summary

In [21]:
# Generate setup report
report = {
    "timestamp": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
    "python_version": sys.version.split()[0],
    "pytorch_version": torch.__version__,
    "ultralytics_version": ultralytics.__version__,
    "cuda_available": torch.cuda.is_available(),
    "gpu_count": torch.cuda.device_count() if torch.cuda.is_available() else 0,
    "device": "GPU" if torch.cuda.is_available() else "CPU"
}

print("Setup Summary Report:")
print("=" * 30)
for key, value in report.items():
    print(f"{key.replace('_', ' ').title()}: {value}")

# Save report
import json
report_path = Path("../results/reports/setup_report.json")
report_path.parent.mkdir(parents=True, exist_ok=True)

with open(report_path, 'w') as f:
    json.dump(report, f, indent=2)

print(f"\nReport saved to: {report_path}")

Setup Summary Report:
Timestamp: 2025-08-05 13:54:52
Python Version: 3.13.2
Pytorch Version: 2.7.1+cpu
Ultralytics Version: 8.3.173
Cuda Available: False
Gpu Count: 0
Device: CPU

Report saved to: ..\results\reports\setup_report.json


## Next Steps

After completing this notebook:

1. **If dataset download successful:** Proceed to `02_data_exploration.ipynb`
2. **If dataset download failed:** 
   - Get Roboflow API key and retry download
   - Or manually download from Kaggle/other sources
   - Update dataset paths accordingly

3. **If GPU not available:**
   - Training will be slower but still possible
   - Consider reducing batch size and model complexity
   - Or install CUDA-enabled PyTorch version

4. **Production Integration:**
   - Remember target output format for production compatibility
   - Model will be saved as `best_model.pt` for transfer
   - Final model path: `license-plate/cached_models/yolov8_indonesian_plates.pt`