# InstaGeo Raster Chip Creator Demo

This notebook demonstrates how to use the InstaGeo Raster Chip Creator to generate image chips and segmentation maps from raster files. The Raster Chip Creator is ideal for scenarios where you have geospatial data in raster format and want to generate training data for machine learning models.

## Prerequisites
- Complete the `chip_creator_demo.ipynb` first to understand the basic InstaGeo workflow
- Have raster files containing label/class information
- Understand the difference between point-based and raster-based chip creation


## 1. Setup and Imports


In [None]:
import os
import json
import geopandas as gpd
import pandas as pd
import numpy as np
from datetime import datetime
from pathlib import Path
import matplotlib.pyplot as plt
import rasterio
from rasterio.plot import show
import warnings
warnings.filterwarnings('ignore')
np.random.seed(42)


# Set up paths
DEMO_DIR = Path("demo_data_2")
OUTPUT_DIR = Path("chip_output_2")
DEMO_DIR.mkdir(exist_ok=True)
OUTPUT_DIR.mkdir(exist_ok=True)

print("✅ Setup complete!")
print(f"📁 Demo data directory: {DEMO_DIR}")
print(f"📁 Output directory: {OUTPUT_DIR}")


## 2. Understanding Raster Chip Creator

The Raster Chip Creator is designed for scenarios where you have raster files containing label information rather than point observations. 

### Key Differences:
- **🔹 Point-based (chip_creator)**: Uses CSV with point coordinates + labels
- **🔹 Raster-based (raster_chip_creator)**: Uses raster files with pixel-level labels

### Key Features:
- Processes raster files containing geospatial label information
- Extracts tiles automatically based on raster geolocation
- Crops tiles into fixed-size chips (e.g., 256x256 pixels)
- Generates segmentation maps from raster pixel values
- Supports both HLS and Sentinel-2 data sources
- Includes bounding box feature processing option

### Workflow:
1. **Input**: Raster file(s) with label information (or bounding boxes)
2. **Extract**: Tiles from raster based on geolocation
3. **Crop**: Tiles into smaller chips
4. **Generate**: Segmentation maps from pixel values (if applicable)
5. **Output**: Chips + segmentation maps (if applicable) ready for ML training


## 3. Use Case 1: Standard Raster Processing

**Use Case**: Land Cover Classification from Existing Raster Data

**Scenario**: You have a land cover raster file where each pixel represents a land cover class (forest, water, urban, etc.) and want to create training chips for a segmentation model.

**This is ideal when:**
- You have existing classified raster data
- You want to train models on pixel-level classifications
- You need to create training data from large raster files


In [None]:
# Create sample raster data for demonstration
# We will assume this is our original raster data that contains
# labels from which we want to create patches
def create_sample_landcover_raster():
    """Create a sample land cover raster with exact spatial resolution from flags."""
    
    # 30m in EPSG:4326
    spatial_resolution = 0.0002694945852358564
    
    width, height = 1000, 1000
    
    # Define bounds using the exact spatial resolution
    west, south = -74.0, 40.7
    east = west + (width * spatial_resolution)
    north = south + (height * spatial_resolution)
    
    # Create transform with exact spatial resolution
    transform = rasterio.transform.from_bounds(west, south, east, north, width, height)
    
    # Verify the resolution
    print(f"Spatial resolution: {spatial_resolution:.15f}°")
    print(f"Pixel size X: {transform[0]:.15f}°")
    print(f"Pixel size Y: {abs(transform[4]):.15f}°")
    
    # Create land cover classes
    # 0: Water, 1: Forest, 2: Urban, 3: Agriculture, 4: Grassland
    landcover = np.zeros((height, width), dtype=np.uint8)
    
    np.random.seed(42)
    
    # Simple approach: Create rectangular patches
    patch_size = 100  # Size of each patch
    
    for i in range(0, height, patch_size):
        for j in range(0, width, patch_size):
            # Randomly assign land cover type to each patch
            landcover[i:i+patch_size, j:j+patch_size] = np.random.choice([0, 1, 2, 3, 4], 
                                                                        p=[0.1, 0.3, 0.2, 0.25, 0.15])
    
    # Add some variation within patches
    for i in range(0, height, patch_size//2):
        for j in range(0, width, patch_size//2):
            if np.random.random() < 0.3:
                landcover[i:i+patch_size//2, j:j+patch_size//2] = np.random.choice([0, 1, 2, 3, 4])
    
    # Save raster with exact spatial resolution
    raster_path = DEMO_DIR / "sample_landcover.tif"
    with rasterio.open(
        raster_path, 'w',
        driver='GTiff',
        height=height, width=width,
        count=1,
        dtype=rasterio.uint8,
        crs='EPSG:4326',
        transform=transform
    ) as dst:
        dst.write(landcover, 1)
    
    return raster_path

In [None]:
# Visualize the sample raster
def visualize_sample_raster(raster_path):
    """Visualize the sample raster data."""
    
    with rasterio.open(raster_path) as src:
        data = src.read(1)
        transform = src.transform
        bounds = src.bounds
    
    fig, ax = plt.subplots(1, 1, figsize=(10, 8))
    
    # Create colormap for land cover classes
    colors = ['blue', 'green', 'gray', 'yellow', 'lightgreen']
    cmap = plt.matplotlib.colors.ListedColormap(colors)
    
    im = ax.imshow(data, cmap=cmap, extent=[bounds.left, bounds.right, bounds.bottom, bounds.top])
    
    ax.set_title('Sample Land Cover Raster', fontsize=14, fontweight='bold')
    ax.set_xlabel('Longitude')
    ax.set_ylabel('Latitude')
    
    # Add colorbar
    cbar = plt.colorbar(im, ax=ax, shrink=0.8)
    cbar.set_label('Land Cover Class')
    cbar.set_ticks([0, 1, 2, 3, 4])
    cbar.set_ticklabels(['Water', 'Forest', 'Urban', 'Agriculture', 'Grassland'])
    
    plt.tight_layout()
    plt.show()
    
    print(f"📊 Raster dimensions: {data.shape}")
    print(f"📊 Unique classes: {np.unique(data)}")
    print(f"📊 Class distribution: {np.bincount(data.flatten())}")


In [None]:
# Generate big raster data
print("🔄 Creating sample data...")
raster_path = create_sample_landcover_raster()
print(f"✅ Sample raster created: {raster_path}")
# Visualize the sample data
print("\n📊 Visualizing sample raster data:")
visualize_sample_raster(raster_path)



In [None]:

# Create sample records file
def create_sample_records():
    """Create a sample records file with geometries and existing segmentation maps."""
    
    # Create sample geometries (polygons representing areas of interest)
    # This is the data that we will use to extract segmentation maps
    # and retrieve satellite images for the corresponding areas
    from shapely.geometry import Polygon
    import rasterio
    import numpy as np
    
    # Create segmentation map files by extracting 256x256 patches from the big raster
    raster_path = DEMO_DIR / "sample_landcover.tif"
    seg_map_dir = DEMO_DIR / "seg_maps"
    seg_map_dir.mkdir(exist_ok=True)
    
    label_filenames = []
    geometries = []
    
    with rasterio.open(raster_path) as src:
        # Define patch locations 
        patch_locations = [
            (0, 0),   # First patch
            (400, 300),   # Second patch  
            (700, 500)    # Third patch
        ]
        
        dates = ["2024-06-15", "2024-06-20", "2024-06-25"]
        mgrs_tile_ids = ["18TWL", "18TWL", "18TWL"]
        
        for i, (row, col) in enumerate(patch_locations):
            # Extract 256x256 patch
            patch = src.read(1, window=rasterio.windows.Window(col, row, 256, 256))
            
            # Create output filename
            label_filename = f"label_{i+1}.tif"
            label_path = seg_map_dir / label_filename
            
            # Get transform for this patch
            patch_transform = rasterio.windows.transform(
                rasterio.windows.Window(col, row, 256, 256), 
                src.transform
            )
            
            # Get bounds for this patch
            bounds = rasterio.windows.bounds(
                rasterio.windows.Window(col, row, 256, 256), 
                src.transform
            )

            # Create geometry for this patch
            left, bottom, right, top = bounds
            geom = Polygon([
                (left, bottom),
                (right, bottom), 
                (right, top),
                (left, top),
                (left, bottom)
            ])
            geometries.append(geom)
            
            # Save the segmentation map
            with rasterio.open(
                label_path, 'w',
                driver='GTiff',
                height=256,
                width=256,
                count=1,
                dtype=rasterio.uint8,
                crs=src.crs,
                transform=patch_transform
            ) as dest:
                dest.write(patch, 1)
            
            label_filenames.append(label_filename)
            print(f"Created segmentation map: {label_filename} with dimensions {patch.shape}")
    
    records_data = {
        'geometry': geometries,
        'date': dates,
        'area_id': [1, 2, 3],
        'mgrs_tile_id': mgrs_tile_ids,
        'label_filename': label_filenames
    }
    
    records_df = gpd.GeoDataFrame(records_data, crs='EPSG:4326')
    records_path = DEMO_DIR / "sample_records.gpkg"
    records_df.to_file(records_path, driver='GPKG')
    
    return records_path, label_filenames

In [None]:
# Generate sample records 
records_path, label_filenames = create_sample_records()
print(f"✅ Sample records created: {records_path}")
# Visualize the sample records
for label_filename in label_filenames:
    visualize_sample_raster(DEMO_DIR / "seg_maps" / label_filename)


### Standard Raster Processing Command

In [None]:
!python -m instageo.data.raster_chip_creator \
    --records_file=demo_data_2/sample_records.gpkg \
    --raster_path=demo_data_2/seg_maps \
    --output_directory=chip_output_2/standard \
    --data_source=HLS \
    --chip_size=256 \
    --temporal_tolerance=5 \
    --cloud_coverage=30 \
    --num_steps=1 \
    --daytime_only=False \
    --qa_check=True


## 4. Use Case 2: Bounding Box Feature Processing

**Use Case**: Processing Specific Geographic Regions

**Scenario**: You have a list of bounding boxes defining specific regions of interest  and want to extract satellite data (no labels). 

**This is useful when:**
- You have predefined regions of interest and only want to extract chips from satellite data


In [None]:
# Create sample bounding box features
def create_sample_bbox_features():
    """Create sample bounding box features for demonstration."""
    
    # Define bounding boxes (minx, miny, maxx, maxy)
    bbox_features = [
        [-73.96, 40.74, -73.94, 40.76],  
        [-122.15, 37.75, -122.13, 37.77],  
        [-122.35, 37.95, -122.33, 37.97],
    ]
    
    bbox_path = DEMO_DIR / "sample_bbox_features.json"
    with open(bbox_path, 'w') as f:
        json.dump(bbox_features, f, indent=2)
    
    return bbox_path

# Generate bounding box features
bbox_path = create_sample_bbox_features()
print(f"✅ Sample bounding box features created: {bbox_path}")


### Bounding Box Feature Processing Command

In [None]:
!python -m instageo.data.raster_chip_creator \
    --is_bbox_feature=True \
    --bbox_feature_path=demo_data_2/sample_bbox_features.json \
    --output_directory=chip_output_2/bbox \
    --data_source=HLS \
    --chip_size=256 \
    --temporal_tolerance=5 \
    --num_steps=1 \
    --cloud_coverage=40 \
    --daytime_only=False \
    --qa_check=True \
    --date=10-10-2025

## Summary

### 🎯 Key Takeaways:

1. **Raster Chip Creator** is ideal for pixel-level label data from existing raster files
2. Supports both **standard raster processing** and **bounding box features**
3. Works with **HLS and Sentinel-2** data sources
4. Generates **chips and segmentation maps(only for standard raster processing)** for ML training
5. Includes **quality assurance checks** and flexible parameters

### 📊 Demo Results:

| Use Case | Input | Output | Chips Generated | Data Source |
|----------|-------|--------|------------------|-------------|
| Standard Processing | Records + Raster | 256×256 chips + seg maps | 3 chips | HLS |
| Bounding Box Features | BBox JSON | 256×256 chips | 3 chips | HLS |

### 📚 Next Steps:
- Try running the commands with your own raster data
- Experiment with different chip sizes and parameters
- Combine with `data_cleaner` for preprocessing
- Use `data_splitter` for train/validation/test splits

### 🔗 Related Demos:
- `chip_creator_demo.ipynb`: Point-based chip creation
- `data_cleaner_demo.ipynb`: Data cleaning and preprocessing
- `data_splitter_demo.ipynb`: Dataset splitting strategies
