# Complete U-Net Training Pipeline for Field Segmentation

This comprehensive notebook guides you through the complete process of training a U-Net model for satellite imagery field segmentation in Google Colab, from initial setup to production deployment.

## Overview
- **Phase 1**: Environment Setup and Data Access
- **Phase 2**: Configuration Setup
- **Phase 3**: Training Execution
- **Phase 4**: Evaluation
- **Phase 5**: Export to Production

## Prerequisites
- Google Drive with Sentinel-2 dataset in `sentinel2_datasets/` folder
- GitHub repository access (https://github.com/ns530/skycrop.git)
- Colab runtime with GPU enabled (Runtime > Change runtime type > GPU)

## Phase 1: Environment Setup and Data Access

In [None]:
# 1.1 Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')
print("Google Drive mounted successfully!")

In [None]:
# 1.2 Clone the SkyCrop repository
!git clone https://github.com/ns530/skycrop.git
%cd skycrop/ml-training
print("Repository cloned and navigated to ml-training directory")

In [None]:
# 1.3 Install dependencies
!pip install -r requirements.txt
print("All dependencies installed successfully!")

In [None]:
# 1.4 Set environment variables and verify setup
import os

# Set data and runs directories
os.environ['DATA_DIR'] = '/content/drive/MyDrive'
os.environ['RUNS_DIR'] = '/content/drive/MyDrive/runs'
os.environ['MODEL_VERSION'] = '1.0.0'

print(f"DATA_DIR: {os.environ['DATA_DIR']}")
print(f"RUNS_DIR: {os.environ['RUNS_DIR']}")
print(f"MODEL_VERSION: {os.environ['MODEL_VERSION']}")
print(f"Current working directory: {os.getcwd()}")

In [None]:
# 1.5 Verify dataset access
dataset_path = '/content/drive/MyDrive/sentinel2_datasets'

if os.path.exists(dataset_path):
    print(f"‚úì Dataset found at: {dataset_path}")
    contents = os.listdir(dataset_path)
    print(f"Contents: {contents}")
    
    # Check for expected subdirectories
    expected_dirs = ['train', 'val', 'test']
    for dir_name in expected_dirs:
        dir_path = os.path.join(dataset_path, dir_name)
        if os.path.exists(dir_path):
            print(f"‚úì {dir_name}/ directory exists")
        else:
            print(f"‚úó {dir_name}/ directory missing")
else:
    print(f"‚úó Dataset not found at: {dataset_path}")
    print("Please ensure your dataset is properly organized in Google Drive")

In [None]:
# 1.6 Detailed dataset structure verification
import os
from collections import defaultdict

def analyze_dataset_structure(base_path):
    """Analyze the complete dataset structure and file counts"""
    structure = {}
    total_files = 0
    
    for root, dirs, files in os.walk(base_path):
        level = root.replace(base_path, '').count(os.sep)
        indent = ' ' * 2 * level
        structure[root] = {
            'dirs': dirs,
            'files': files,
            'level': level
        }
        total_files += len(files)
        
        if level <= 2:  # Only show top levels
            print(f"{indent}üìÅ {os.path.basename(root)}/ ({len(files)} files)")
    
    return structure, total_files

if os.path.exists(dataset_path):
    print("Dataset Structure Analysis:")
    print("=" * 50)
    structure, total_files = analyze_dataset_structure(dataset_path)
    print(f"\nTotal files in dataset: {total_files}")
    
    # File type analysis
    file_types = defaultdict(int)
    for root, info in structure.items():
        for file in info['files']:
            ext = os.path.splitext(file)[1].lower()
            file_types[ext] += 1
    
    print("\nFile types:")
    for ext, count in sorted(file_types.items()):
        print(f"  {ext}: {count} files")
else:
    print("Dataset path does not exist. Please check your Google Drive setup.")

## Phase 2: Configuration Setup

In [None]:
# 2.1 Load and examine the configuration file
import yaml

config_path = 'config.yaml'
if os.path.exists(config_path):
    with open(config_path, 'r') as f:
        config = yaml.safe_load(f)
    
    print("Current Configuration:")
    print("=" * 50)
    print(yaml.dump(config, default_flow_style=False, indent=2))
else:
    print(f"Configuration file not found at {config_path}")
    print("Please ensure you're in the correct directory")

In [None]:
# 2.2 Modify configuration for Colab environment
import yaml

# Load current config
with open('config.yaml', 'r') as f:
    config = yaml.safe_load(f)

# Update paths for Colab
config['data']['data_dir'] = '/content/drive/MyDrive'
config['data']['tiles_dir'] = '/content/drive/MyDrive/data/tiles'

# Adjust training parameters for Colab
config['train']['batch_size'] = 2  # Smaller batch size for Colab
config['train']['epochs'] = 10     # Fewer epochs for testing

# Save updated config
with open('config_colab.yaml', 'w') as f:
    yaml.dump(config, f, default_flow_style=False, indent=2)

print("Updated configuration saved as 'config_colab.yaml'")
print("Key changes:")
print(f"- Data directory: {config['data']['data_dir']}")
print(f"- Batch size: {config['train']['batch_size']}")
print(f"- Epochs: {config['train']['epochs']}")

In [None]:
# 2.3 Verify model registry
import json

registry_path = 'model_registry.json'
if os.path.exists(registry_path):
    with open(registry_path, 'r') as f:
        registry = json.load(f)
    
    print("Current Model Registry:")
    print("=" * 50)
    if registry:
        for model_name, versions in registry.items():
            print(f"Model: {model_name}")
            for version, info in versions.items():
                print(f"  Version {version}: {info.get('created_at', 'N/A')}")
    else:
        print("Registry is empty - no trained models yet")
else:
    print("Model registry not found. Will be created during export.")

## Phase 3: Training Execution

In [None]:
# 3.1 Run U-Net training
print("Starting U-Net training...")
print("This may take several hours depending on your dataset size.")
print("Monitor progress in the cell output below.")
print("=" * 50)

!python train_unet.py --config config_colab.yaml

In [None]:
# 3.2 Check training results
import json
import os

# Find the latest training run
runs_dir = '/content/drive/MyDrive/runs'
if os.path.exists(runs_dir):
    runs = [d for d in os.listdir(runs_dir) if os.path.isdir(os.path.join(runs_dir, d))]
    if runs:
        latest_run = max(runs)
        summary_path = os.path.join(runs_dir, latest_run, 'train_summary.json')
        
        if os.path.exists(summary_path):
            with open(summary_path, 'r') as f:
                summary = json.load(f)
            
            print("Training Summary:")
            print("=" * 50)
            print(f"Status: {summary.get('status', 'Unknown')}")
            print(f"Duration: {summary.get('elapsed_sec', 0)/3600:.2f} hours")
            print(f"Epochs completed: {summary.get('epochs', 0)}")
            
            if 'val_metrics' in summary:
                metrics = summary['val_metrics']
                print(f"\nValidation Metrics:")
                print(f"  IoU: {metrics.get('iou', 0):.4f}")
                print(f"  Dice: {metrics.get('dice', 0):.4f}")
                print(f"  Loss: {metrics.get('loss', 0):.4f}")
            
            if 'best_checkpoint' in summary:
                print(f"\nBest checkpoint: {summary['best_checkpoint']}")
        else:
            print("Training summary not found. Training may still be running or failed.")
    else:
        print("No training runs found in runs directory.")
else:
    print("Runs directory not found.")

In [None]:
# 3.3 Monitor training with TensorBoard (optional)
# Uncomment the following lines to start TensorBoard
# %load_ext tensorboard
# %tensorboard --logdir /content/drive/MyDrive/runs
print("To monitor training progress with TensorBoard:")
print("1. Uncomment the lines above")
print("2. Run this cell")
print("3. Click the TensorBoard link that appears")

## Phase 4: Evaluation

In [None]:
# 4.1 Load and evaluate the trained model
import tensorflow as tf
import numpy as np
import json
import os
from sklearn.metrics import classification_report

def load_trained_model():
    """Load the best trained model from the latest run"""
    runs_dir = '/content/drive/MyDrive/runs'
    
    if not os.path.exists(runs_dir):
        print("Runs directory not found")
        return None
    
    runs = [d for d in os.listdir(runs_dir) if os.path.isdir(os.path.join(runs_dir, d))]
    if not runs:
        print("No training runs found")
        return None
    
    latest_run = max(runs)
    summary_path = os.path.join(runs_dir, latest_run, 'train_summary.json')
    
    if not os.path.exists(summary_path):
        print("Training summary not found")
        return None
    
    with open(summary_path, 'r') as f:
        summary = json.load(f)
    
    checkpoint_path = summary.get('best_checkpoint')
    if not checkpoint_path or not os.path.exists(checkpoint_path):
        print("Best checkpoint not found")
        return None
    
    print(f"Loading model from: {checkpoint_path}")
    model = tf.keras.models.load_model(checkpoint_path, compile=False)
    return model

model = load_trained_model()
if model:
    print(f"‚úì Model loaded successfully")
    print(f"  Input shape: {model.input_shape}")
    print(f"  Output shape: {model.output_shape}")
else:
    print("‚úó Failed to load model")

In [None]:
# 4.2 Evaluate on test set
import os
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt

def evaluate_model(model, test_images_dir, test_masks_dir, num_samples=5):
    """Evaluate model on test set and visualize results"""
    if not model:
        print("No model available for evaluation")
        return
    
    # Get test files
    image_files = sorted([f for f in os.listdir(test_images_dir) if f.endswith(('.png', '.jpg', '.jpeg'))])
    mask_files = sorted([f for f in os.listdir(test_masks_dir) if f.endswith(('.png', '.jpg', '.jpeg'))])
    
    if len(image_files) != len(mask_files):
        print("Mismatch between number of images and masks")
        return
    
    print(f"Evaluating on {len(image_files)} test samples")
    
    # Calculate metrics
    ious = []
    dices = []
    
    for i, (img_file, mask_file) in enumerate(zip(image_files[:num_samples], mask_files[:num_samples])):
        # Load and preprocess
        img_path = os.path.join(test_images_dir, img_file)
        mask_path = os.path.join(test_masks_dir, mask_file)
        
        img = np.array(Image.open(img_path).resize((512, 512))) / 255.0
        mask = np.array(Image.open(mask_path).resize((512, 512))) / 255.0
        mask = (mask > 0.5).astype(np.float32)
        
        # Predict
        pred = model.predict(np.expand_dims(img, axis=0))[0]
        pred_binary = (pred > 0.5).astype(np.float32)
        
        # Calculate metrics
        intersection = np.sum(pred_binary * mask)
        union = np.sum(pred_binary) + np.sum(mask) - intersection
        iou = intersection / union if union > 0 else 0
        dice = 2 * intersection / (np.sum(pred_binary) + np.sum(mask)) if (np.sum(pred_binary) + np.sum(mask)) > 0 else 0
        
        ious.append(iou)
        dices.append(dice)
        
        # Visualize first few samples
        if i < 3:
            fig, axes = plt.subplots(1, 3, figsize=(15, 5))
            axes[0].imshow(img)
            axes[0].set_title('Input Image')
            axes[1].imshow(mask, cmap='gray')
            axes[1].set_title('Ground Truth')
            axes[2].imshow(pred_binary, cmap='gray')
            axes[2].set_title(f'Prediction\nIoU: {iou:.3f}, Dice: {dice:.3f}')
            plt.show()
    
    # Summary statistics
    print("\nEvaluation Results:")
    print(f"Mean IoU: {np.mean(ious):.4f} ¬± {np.std(ious):.4f}")
    print(f"Mean Dice: {np.mean(dices):.4f} ¬± {np.std(dices):.4f}")
    
    return np.mean(ious), np.mean(dices)

# Run evaluation
test_images_dir = '/content/drive/MyDrive/sentinel2_datasets/test/test_images'
test_masks_dir = '/content/drive/MyDrive/sentinel2_datasets/test/test_masks'

if os.path.exists(test_images_dir) and os.path.exists(test_masks_dir):
    mean_iou, mean_dice = evaluate_model(model, test_images_dir, test_masks_dir)
else:
    print("Test directories not found. Please check your dataset structure.")

In [None]:
# 4.3 Run model tests
print("Running model tests...")
!python -m pytest tests/test_unet_train_smoke.py -v
print("Tests completed.")

## Phase 5: Export to Production

In [None]:
# 5.1 Export the trained model
print("Exporting trained model to production formats...")
!python export.py --config config_colab.yaml --version 1.0.0
print("Export completed.")

In [None]:
# 5.2 Verify exported model
import os
import json

export_dir = '/content/skycrop/ml-training/u-net-trained-model/models/unet/1.0.0'
if os.path.exists(export_dir):
    print(f"‚úì Export directory found: {export_dir}")
    contents = os.listdir(export_dir)
    print(f"Contents: {contents}")
    
    # Check for required files
    required_files = ['savedmodel.tar.gz', 'model.onnx', 'metrics.json', 'sha256.txt']
    for file in required_files:
        if file in contents:
            print(f"‚úì {file} present")
        else:
            print(f"‚úó {file} missing")
    
    # Check metrics
    metrics_path = os.path.join(export_dir, 'metrics.json')
    if os.path.exists(metrics_path):
        with open(metrics_path, 'r') as f:
            metrics = json.load(f)
        print(f"\nModel Metrics:")
        for key, value in metrics.items():
            print(f"  {key}: {value}")
else:
    print(f"‚úó Export directory not found: {export_dir}")

In [None]:
# 5.3 Update and verify model registry
import json

registry_path = 'model_registry.json'
if os.path.exists(registry_path):
    with open(registry_path, 'r') as f:
        registry = json.load(f)
    
    print("Updated Model Registry:")
    print("=" * 50)
    for model_name, versions in registry.items():
        print(f"Model: {model_name}")
        for version, info in versions.items():
            print(f"  Version {version}:")
            print(f"    Created: {info.get('created_at', 'N/A')}")
            print(f"    URI: {info.get('uri', 'N/A')}")
            if 'metrics' in info:
                print(f"    Metrics: {info['metrics']}")
else:
    print("Model registry not found.")

In [None]:
# 5.4 Test inference with exported model
import onnxruntime as ort
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt

def test_onnx_inference(onnx_path, test_image_path):
    """Test inference with the exported ONNX model"""
    if not os.path.exists(onnx_path):
        print(f"ONNX model not found: {onnx_path}")
        return
    
    # Load ONNX model
    session = ort.InferenceSession(onnx_path)
    
    # Load and preprocess test image
    if os.path.exists(test_image_path):
        img = np.array(Image.open(test_image_path).resize((512, 512))) / 255.0
        img = np.expand_dims(img, axis=0).astype(np.float32)
        
        # Run inference
        outputs = session.run(None, {'input_1': img})
        prediction = outputs[0][0]
        
        # Visualize
        plt.figure(figsize=(10, 5))
        plt.subplot(1, 2, 1)
        plt.imshow(img[0])
        plt.title('Input Image')
        plt.subplot(1, 2, 2)
        plt.imshow(prediction > 0.5, cmap='gray')
        plt.title('ONNX Prediction')
        plt.show()
        
        print("‚úì ONNX inference test successful")
    else:
        print(f"Test image not found: {test_image_path}")

# Test with exported ONNX model
onnx_path = 'models/unet/1.0.0/model.onnx'
test_image_path = '/content/drive/MyDrive/sentinel2_datasets/test/test_images/' + os.listdir('/content/drive/MyDrive/sentinel2_datasets/test/test_images')[0]

test_onnx_inference(onnx_path, test_image_path)

## Summary and Next Steps

Congratulations! You have completed the complete U-Net training pipeline. Here's what you accomplished:

### ‚úÖ Completed Phases:
1. **Environment Setup**: Mounted Drive, cloned repo, installed dependencies
2. **Configuration**: Modified config for Colab environment
3. **Training**: Executed U-Net training with monitoring
4. **Evaluation**: Assessed model performance on test set
5. **Export**: Converted model to production formats (ONNX, SavedModel)

### üìÅ Generated Artifacts:
- Trained model checkpoints in `/content/drive/MyDrive/runs/`
- Exported models in `models/unet/1.0.0/`
- Updated model registry in `model_registry.json`
- Training metrics and evaluation results

### üöÄ Deployment Ready:
Your trained U-Net model is now ready for deployment to the SkyCrop ML service for field segmentation tasks.

### üîß Troubleshooting Tips:
- **Memory Issues**: Reduce batch_size in config
- **Long Training**: Use GPU runtime and monitor with TensorBoard
- **Dataset Issues**: Verify file paths and formats
- **Export Failures**: Check model loading and ONNX compatibility

### üìö Additional Resources:
- Check the SkyCrop documentation for ML service integration
- Review the model registry for version management
- Use the exported ONNX model for inference in production