# üèóÔ∏è YOLOv8 Segmentation Training - Google Colab

Train a YOLOv8 segmentation model to detect and segment:
- **Exposed Rebar** üî©
- **Spalling** üß±

---

## ‚ö° Quick Start Guide

### Before You Begin:
1. **Enable GPU**: Click `Runtime` ‚Üí `Change runtime type` ‚Üí Select `GPU` (T4 or better)
2. **Get Roboflow API Key**: Go to https://app.roboflow.com/settings/api and copy your key

### Run the Notebook:
- **Easy Mode**: Click `Runtime` ‚Üí `Run all` (requires API key in Cell 6)
- **Step by Step**: Run each cell with `Shift + Enter`

---

## üìã Requirements
- ‚úÖ Google Colab with GPU enabled (T4 or better recommended)
- ‚úÖ Roboflow API Key (free account - get it at https://app.roboflow.com/settings/api)
- ‚úÖ ~2-4 hours training time

## üéØ Training Pipeline
1. ‚úì Check GPU availability
2. ‚úì Install dependencies (Ultralytics + Roboflow)
3. ‚ö†Ô∏è **Download dataset** (YOU NEED API KEY HERE!)
4. ‚úì Configure training parameters
5. ‚úì Train YOLOv8 segmentation model
6. ‚úì Visualize results & metrics
7. ‚úì Test on sample images
8. ‚úì Download trained model


---
## 1Ô∏è‚É£ Check GPU Availability


In [None]:
import torch
import os

# Check GPU
print("üîç Checking GPU availability...\n")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"GPU Device: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.2f} GB")
    device = '0'  # Use first GPU
else:
    print("‚ö†Ô∏è No GPU detected! Training will be slow on CPU.")
    device = 'cpu'

print(f"\n‚úÖ Training device: {device}")


---
## 2Ô∏è‚É£ Install Dependencies

This cell will:
1. Fix NumPy compatibility (downgrade from 2.x to 1.x)
2. Install Ultralytics (YOLOv8)
3. Install Roboflow (for dataset download)

**Note:** The NumPy fix is required because Colab now uses NumPy 2.x by default, but some packages were compiled with NumPy 1.x.


In [None]:
print("üì¶ Installing dependencies...\n")

# Fix NumPy compatibility issue (Colab has NumPy 2.x but some packages need 1.x)
print("üîß Step 1: Fixing NumPy compatibility...")
!pip install "numpy<2" -q

# Install ultralytics (YOLOv8)
print("üîß Step 2: Installing Ultralytics (YOLOv8)...")
!pip install ultralytics -q

# Install roboflow
print("üîß Step 3: Installing Roboflow...")
!pip install roboflow -q

print("\n‚úÖ All dependencies installed successfully!")
print("‚úÖ NumPy version fixed for compatibility")

# Verify installations
import numpy as np
print(f"\nüìä Installed versions:")
print(f"   NumPy: {np.__version__}")
try:
    from ultralytics import YOLO
    print(f"   Ultralytics: ‚úì Installed")
except:
    print(f"   Ultralytics: ‚úó Failed")
try:
    from roboflow import Roboflow
    print(f"   Roboflow: ‚úì Installed")
except:
    print(f"   Roboflow: ‚úó Failed")


---
## 3Ô∏è‚É£ Download Dataset from Roboflow

### ‚ö†Ô∏è IMPORTANT: Get Your Roboflow API Key

**Before running this cell, you MUST:**

1. Go to: https://app.roboflow.com/settings/api
2. Log in or create a free account
3. Copy your **Private API Key**
4. Paste it in the cell below, replacing `"YOUR_API_KEY_HERE"`

**Example:**
```python
ROBOFLOW_API_KEY = "abc123XYZ456"  # ‚Üê Replace with YOUR actual key
```

‚ö†Ô∏è **Do NOT run this cell until you replace the API key!**


In [None]:
from roboflow import Roboflow

# üîë YOUR ROBOFLOW API KEY
# Get it from: https://app.roboflow.com/settings/api
ROBOFLOW_API_KEY = "orloumjlWtpPXoxK5bFa"  # ‚úÖ API key configured!

# Validate API key
if ROBOFLOW_API_KEY == "YOUR_API_KEY_HERE":
    print("‚ùå ERROR: You need to replace 'YOUR_API_KEY_HERE' with your actual Roboflow API key!")
    print("\nüìã Steps to get your API key:")
    print("   1. Go to: https://app.roboflow.com/settings/api")
    print("   2. Log in to your Roboflow account (or create one - it's free!)")
    print("   3. Copy your 'Private API Key'")
    print("   4. Paste it above, replacing 'YOUR_API_KEY_HERE'")
    print("   5. Re-run this cell")
    print("\nüí° Example: ROBOFLOW_API_KEY = \"abc123XYZ456def789\"")
    raise ValueError("API key not configured")

# Initialize Roboflow
print("üîë Initializing Roboflow with your API key...")
rf = Roboflow(api_key=ROBOFLOW_API_KEY)

# Download the Spalling and Exposed Rebar dataset
print("üì¶ Downloading dataset from Roboflow...")
print("   Workspace: labelling-9tvkx")
print("   Project: spalling-and-exposed-rebar-ttsjj")
print("   Version: 1")
print("\n‚è≥ This may take a few minutes...")

project = rf.workspace("labelling-9tvkx").project("spalling-and-exposed-rebar-ttsjj")
dataset = project.version(1).download("yolov8")

print(f"\n‚úÖ Dataset downloaded successfully!")
print(f"üìÇ Location: {dataset.location}")
dataset_path = dataset.location

# Show dataset structure
import os
print(f"\nüìä Dataset Structure:")
for folder in ['train', 'valid', 'test']:
    folder_path = os.path.join(dataset.location, folder, 'images')
    if os.path.exists(folder_path):
        count = len([f for f in os.listdir(folder_path) if f.endswith('.jpg')])
        print(f"   {folder:6s}: {count} images")


---
## üí° Troubleshooting - Common Issues

### üìã Installation Issues (Cell 4)

#### ‚ùå Error: "numpy.core.multiarray failed to import" or "NumPy 2.x incompatibility"
**Solution:** This is already fixed in the updated Cell 4!
- Re-run **Cell 4** to downgrade NumPy to version 1.x
- You should see "NumPy: 1.x.x" in the output (not 2.x.x)
- If you still get errors, restart runtime: `Runtime ‚Üí Restart runtime` and run all cells again

---

### üì¶ Dataset Download Issues (Cell 6)

#### ‚ùå Error: "This API key does not exist"
**Solution:** You didn't replace `YOUR_API_KEY_HERE` with your actual API key!
- Go back to Cell 6
- Replace the placeholder with your real key from https://app.roboflow.com/settings/api
- Re-run Cell 6

#### ‚ùå Error: "Workspace not found"
**Solution:** The dataset might not be accessible with your account
- Option 1: Use your own dataset from Roboflow
- Option 2: Upload your own dataset and change the workspace/project names

#### ‚ùå Error: Network/Connection issues
**Solution:** 
- Check your internet connection
- Try running Cell 6 again
- The download can take 2-5 minutes for large datasets

---

### ‚úÖ If Everything Works:
You should see:
- Cell 4: "NumPy: 1.x.x" and all packages installed ‚úì
- Cell 6: "‚úÖ Dataset downloaded successfully!" with image counts

üëá **Continue to the next cell if you see the success messages above!**


---
## 4Ô∏è‚É£ Configure Training Parameters


In [None]:
# Training Configuration
CONFIG = {
    'model': 'yolov8n-seg.pt',      # Pretrained YOLOv8 nano segmentation model
    'data': f'{dataset_path}/data.yaml',  # Dataset configuration
    'epochs': 100,                   # Number of training epochs
    'imgsz': 640,                    # Image size
    'batch': 16,                     # Batch size (adjust based on GPU memory)
    'device': device,                # GPU/CPU device
    'project': 'runs/segment',       # Project folder
    'name': 'spalling_rebar_training',  # Experiment name
    'patience': 20,                  # Early stopping patience
    'save': True,                    # Save checkpoints
    'optimizer': 'Adam',             # Optimizer
    'lr0': 0.001,                    # Initial learning rate
    'lrf': 0.01,                     # Final learning rate
    'momentum': 0.937,               # Momentum
    'weight_decay': 0.0005,          # Weight decay
    'warmup_epochs': 3.0,            # Warmup epochs
    'box': 7.5,                      # Box loss gain
    'cls': 0.5,                      # Classification loss gain
    'dfl': 1.5,                      # Distribution focal loss gain
    'plots': True,                   # Generate plots
    'verbose': True,                 # Verbose output
    'close_mosaic': 10               # Close mosaic augmentation after N epochs
}

print("üìä Training Configuration:")
print("=" * 60)
for key, value in CONFIG.items():
    print(f"   {key:20s}: {value}")
print("=" * 60)


---
## 5Ô∏è‚É£ Initialize Model and Start Training


In [None]:
from ultralytics import YOLO
import time

print("\n" + "üöÄ" * 40)
print("STARTING YOLOV8 SEGMENTATION TRAINING")
print("üöÄ" * 40 + "\n")

# Initialize model
print("üîß Loading YOLOv8n-seg pretrained model...")
model = YOLO(CONFIG['model'])

# Start training
print("\nüèãÔ∏è  Starting training...\n")
start_time = time.time()

results = model.train(**CONFIG)

training_time = time.time() - start_time

print("\n" + "=" * 80)
print("üéâ TRAINING COMPLETED! üéâ")
print("=" * 80)
print(f"\n‚è±Ô∏è  Total training time: {training_time/3600:.2f} hours")
print(f"üìÇ Model saved to: runs/segment/{CONFIG['name']}/weights/best.pt")


---
## 6Ô∏è‚É£ Visualize Training Results


In [None]:
from IPython.display import Image, display
import glob

results_dir = f"runs/segment/{CONFIG['name']}"

print("üìà Training Results Visualizations:\n")

# Display results plot
if os.path.exists(f"{results_dir}/results.png"):
    print("üìä Training Curves:")
    display(Image(filename=f"{results_dir}/results.png"))

# Display confusion matrix
if os.path.exists(f"{results_dir}/confusion_matrix_normalized.png"):
    print("\nüéØ Confusion Matrix:")
    display(Image(filename=f"{results_dir}/confusion_matrix_normalized.png"))

# Display validation predictions
val_images = glob.glob(f"{results_dir}/val_batch*_pred.jpg")
if val_images:
    print("\nüîç Validation Predictions:")
    for img_path in val_images[:3]:  # Show first 3 batches
        display(Image(filename=img_path, width=800))


---
## 7Ô∏è‚É£ Evaluate Model Performance


In [None]:
import pandas as pd

# Load results CSV
results_csv = f"{results_dir}/results.csv"
if os.path.exists(results_csv):
    df = pd.read_csv(results_csv)
    df.columns = df.columns.str.strip()  # Remove whitespace from column names
    
    print("üìä Final Training Metrics:\n")
    print("=" * 80)
    
    # Get last epoch metrics
    last_epoch = df.iloc[-1]
    
    metrics = [
        ('Box Precision', 'metrics/precision(B)'),
        ('Box Recall', 'metrics/recall(B)'),
        ('Box mAP50', 'metrics/mAP50(B)'),
        ('Box mAP50-95', 'metrics/mAP50-95(B)'),
        ('Mask Precision', 'metrics/precision(M)'),
        ('Mask Recall', 'metrics/recall(M)'),
        ('Mask mAP50', 'metrics/mAP50(M)'),
        ('Mask mAP50-95', 'metrics/mAP50-95(M)'),
    ]
    
    for name, col in metrics:
        if col in df.columns:
            print(f"{name:25s}: {last_epoch[col]:.4f}")
    
    print("=" * 80)
    
    # Show last 10 epochs
    print("\nüìâ Last 10 Epochs Performance:")
    print(df[['epoch', 'metrics/mAP50(B)', 'metrics/mAP50-95(B)', 
              'metrics/mAP50(M)', 'metrics/mAP50-95(M)']].tail(10).to_string(index=False))
else:
    print("‚ö†Ô∏è Results CSV not found")


---
## 8Ô∏è‚É£ Test Model on Sample Images


In [None]:
# Load the best trained model
best_model_path = f"{results_dir}/weights/best.pt"
model = YOLO(best_model_path)

# Get some test images
test_images = glob.glob(f"{dataset_path}/test/images/*.jpg")[:5]

if test_images:
    print("üî¨ Running inference on test images...\n")
    
    for img_path in test_images:
        # Run inference
        results = model(img_path)
        
        # Plot results
        for r in results:
            im_array = r.plot()  # Plot with annotations
            
            # Display
            from PIL import Image as PILImage
            import matplotlib.pyplot as plt
            
            plt.figure(figsize=(12, 8))
            plt.imshow(im_array[..., ::-1])  # Convert BGR to RGB
            plt.axis('off')
            plt.title(f"Prediction: {os.path.basename(img_path)}")
            plt.tight_layout()
            plt.show()
            
            # Print detections
            if len(r.boxes) > 0:
                print(f"   Detected {len(r.boxes)} objects in {os.path.basename(img_path)}")
                for i, box in enumerate(r.boxes):
                    cls_id = int(box.cls[0])
                    conf = float(box.conf[0])
                    cls_name = model.names[cls_id]
                    print(f"      {i+1}. {cls_name}: {conf:.3f}")
            else:
                print(f"   No objects detected in {os.path.basename(img_path)}")
            print()
else:
    print("‚ö†Ô∏è No test images found")


---
## 9Ô∏è‚É£ Download Trained Model


In [None]:
from google.colab import files
import shutil

# Create a zip file with all training results
print("üì¶ Preparing files for download...\n")

# Download best model
if os.path.exists(best_model_path):
    print(f"‚¨áÔ∏è  Downloading best model: {best_model_path}")
    files.download(best_model_path)

# Optionally download last model
last_model_path = f"{results_dir}/weights/last.pt"
if os.path.exists(last_model_path):
    print(f"‚¨áÔ∏è  Downloading last checkpoint: {last_model_path}")
    files.download(last_model_path)

# Create and download results archive
archive_name = f"{CONFIG['name']}_results"
print(f"\nüì¶ Creating results archive...")
shutil.make_archive(archive_name, 'zip', results_dir)
print(f"‚¨áÔ∏è  Downloading results archive: {archive_name}.zip")
files.download(f"{archive_name}.zip")

print("\n‚úÖ Downloads complete!")


---
## üîü Export Model (Optional)

Export to different formats for deployment


In [None]:
# Export to ONNX format (for broader deployment)
print("üîÑ Exporting model to ONNX format...\n")

model = YOLO(best_model_path)
export_path = model.export(format='onnx')

print(f"\n‚úÖ Model exported to: {export_path}")
print("\nüì¶ Available export formats:")
print("   - PyTorch (.pt)")
print("   - ONNX (.onnx)")
print("   - TensorRT (.engine)")
print("   - CoreML (.mlmodel)")
print("   - TFLite (.tflite)")
print("\nüí° To export to other formats, use: model.export(format='<format>')")


---
## üìù Summary

### Training Complete! üéâ

Your YOLOv8 segmentation model has been trained to detect and segment:
- **Exposed Rebar**
- **Spalling**

### üìÇ Output Files:
- `best.pt` - Best model weights (lowest validation loss)
- `last.pt` - Last epoch checkpoint
- `results.csv` - Training metrics per epoch
- `results.png` - Training curves visualization
- Various plots and validation predictions

### üöÄ Next Steps:
1. Use the model for inference on new images
2. Deploy the model in your application
3. Fine-tune with more data if needed

### üíª Usage Example:
```python
from ultralytics import YOLO

# Load model
model = YOLO('path/to/best.pt')

# Run inference
results = model('path/to/image.jpg')

# Display results
results[0].show()
```

---
**Happy Segmenting! üéØ**
