# API Examples - Quick Reference

Condensed notebook covering the core Platform endpoints (updated 13 DEC 2025):

1. Health Check
2. Container Check (Sync)
3. Process Vector
4. **Process Raster v2** (single file, ‚â§800 MB)
5. **Process Large Raster v2** (100 MB - 30 GB, tiled processing)
6. **Process Raster Collection v2** (‚â§20 files, each ‚â§800 MB)
7. Rejection Examples (size/count limit violations)

### Size Routing Summary

| File Size | Job Type | Notes |
|-----------|----------|-------|
| ‚â§800 MB | `process_raster_v2` | Standard COG conversion |
| 100 MB - 30 GB | `process_large_raster_v2` | Tiled COG workflow |
| Collection ‚â§20 files | `process_raster_collection_v2` | Each file must be ‚â§800 MB |

## Setup

In [None]:
import requests
import json
import time

# =============================================================================
# CONFIGURATION - All variables defined here
# =============================================================================

# Function App Base URL
BASE_URL = "https://rmhazuregeoapi-a3dma3ctfdgngwf6.eastus-01.azurewebsites.net"

# Storage Containers (Bronze = raw input, Silver = processed output)
BRONZE_RASTERS_CONTAINER = "bronze-rasters"
BRONZE_VECTORS_CONTAINER = "bronze-vectors"
SILVER_COGS_CONTAINER = "silver-cogs"

# STAC Collections
RASTER_COLLECTION_ID = "system-rasters"
VECTOR_COLLECTION_ID = "system-vectors"

# PostGIS Schema
POSTGIS_SCHEMA = "geo"

# =============================================================================
# Helper Functions
# =============================================================================

def api_call(method, endpoint, data=None, params=None, timeout=30):
    """Make API call and return formatted response."""
    url = f"{BASE_URL}{endpoint}"
    headers = {"Content-Type": "application/json"}
    
    print(f"\n{'='*60}")
    print(f"{method} {endpoint}")
    print(f"{'='*60}")
    
    if data:
        print(f"\nRequest Body:")
        print(json.dumps(data, indent=2))
    
    try:
        if method == "GET":
            response = requests.get(url, params=params, timeout=timeout)
        elif method == "POST":
            response = requests.post(url, json=data, headers=headers, timeout=timeout)
        else:
            raise ValueError(f"Unsupported method: {method}")
        
        print(f"\nStatus: {response.status_code}")
        
        try:
            result = response.json()
            print(f"\nResponse:")
            print(json.dumps(result, indent=2, default=str))
            return result
        except:
            print(f"\nResponse (text): {response.text[:500]}")
            return response.text
            
    except requests.exceptions.Timeout:
        print(f"\n‚ùå Request timed out (timeout={timeout}s)")
        return None
    except Exception as e:
        print(f"\n‚ùå Error: {e}")
        return None

def check_job_status(job_id, max_polls=20, poll_interval=5):
    """Poll job status until completion or timeout."""
    print(f"\n{'='*60}")
    print(f"Polling job: {job_id}")
    print(f"{'='*60}")
    
    for i in range(max_polls):
        result = requests.get(f"{BASE_URL}/api/jobs/status/{job_id}", timeout=30).json()
        status = result.get("status", "unknown")
        stage = result.get("current_stage", "?")
        
        print(f"  [{i+1}/{max_polls}] Status: {status}, Stage: {stage}")
        
        if status in ["completed", "failed"]:
            print(f"\nFinal Result:")
            print(json.dumps(result, indent=2, default=str))
            return result
        
        time.sleep(poll_interval)
    
    print(f"\n‚ö†Ô∏è Polling timeout after {max_polls * poll_interval}s")
    return result

# Display configuration
print("=" * 60)
print("API CONFIGURATION")
print("=" * 60)
print(f"Base URL:              {BASE_URL}")
print(f"Bronze Rasters:        {BRONZE_RASTERS_CONTAINER}")
print(f"Bronze Vectors:        {BRONZE_VECTORS_CONTAINER}")
print(f"Silver COGs:           {SILVER_COGS_CONTAINER}")
print(f"Raster Collection:     {RASTER_COLLECTION_ID}")
print(f"Vector Collection:     {VECTOR_COLLECTION_ID}")
print(f"PostGIS Schema:        {POSTGIS_SCHEMA}")
print("=" * 60)

---
## 1. Health Check

Comprehensive system health check (~60s due to database, Service Bus, and storage checks).

In [None]:
# Health Check (takes ~60s)
result = api_call("GET", "/api/health", timeout=90)

---
## 2. Container Check (Sync)

Quick synchronous endpoint to list blobs in a container. No job queue - returns immediately.

**Parameters:**
- `suffix`: Filter by extension (e.g., `.tif`, `.geojson`)
- `metadata`: `true` (default) returns full blob info, `false` returns just names
- `limit`: Max blobs to return (default: 500, max: 10000)

In [None]:
# Container Check - Sync endpoint (returns immediately, no job queue)
# Uses the bronze container configured above

# List first 10 TIF files with full metadata
container_name = "rmhazuregeobronze"  # QA environment container
result = api_call("GET", f"/api/containers/{container_name}/blobs", 
                  params={"suffix": ".tif", "limit": 10, "metadata": "true"})

# Show summary
if result and isinstance(result, dict):
    count = result.get("count", 0)
    blobs = result.get("blobs", [])
    print(f"\nüìä Found {count} TIF files")
    if blobs:
        total_mb = sum(b.get("size_mb", 0) for b in blobs)
        print(f"üì¶ Total size: {total_mb:.2f} MB")

---
## 3. Process Vector

Submit a vector file (GeoJSON, Shapefile, GeoPackage) for ingestion into PostGIS.

In [None]:
# Submit Vector
vector_request = {
    "dataset_id": "test-vectors",
    "resource_id": "geojson-8",
    "version_id": "v1",
    "container_name": BRONZE_VECTORS_CONTAINER,
    "file_name": "8.geojson",
    "service_name": "Test GeoJSON 8"
}

result = api_call("POST", "/api/platform/submit", vector_request)
vector_job_id = result.get("job_id") if result else None
print(f"\nüìã Job ID: {vector_job_id}")

In [None]:
# Check Vector Job Status
if vector_job_id:
    check_job_status(vector_job_id)
else:
    print("‚ö†Ô∏è No job_id from previous cell")

---
## 4. Process Raster (Single File)

Submit a single raster file for COG conversion and STAC cataloging.

### Size Limits (13 DEC 2025)

| Limit | Value | Behavior |
|-------|-------|----------|
| **Max file size** | 800 MB | Files >800MB rejected ‚Üí use `process_large_raster_v2` |
| **Min file size** | None | Any size accepted |

**Pre-flight validation** automatically checks file size before processing.

### Test Data
- **dctest.tif** (25.8 MB) - Small RGB GeoTIFF, processes in ~22 seconds
- **antigua.tif** (11.16 GB) - Too large, will be rejected with error message

In [None]:
# Submit Single Raster via CoreMachine API (direct)
# Using dctest.tif (25.8 MB) - verified working 13 DEC 2025

raster_request = {
    "blob_name": "dctest.tif",
    "container_name": "rmhazuregeobronze"
}

result = api_call("POST", "/api/jobs/submit/process_raster_v2", raster_request)
raster_job_id = result.get("job_id") if result else None

# Show size metadata from pre-flight validation
if result and "parameters" in result:
    params = result["parameters"]
    print(f"\nüìè Pre-flight Size Check:")
    print(f"   File size: {params.get('_blob_size_mb', 'N/A'):.2f} MB")
    print(f"   File exists: ‚úÖ")

print(f"\nüìã Job ID: {raster_job_id}")

In [None]:
# Check Raster Job Status
if raster_job_id:
    check_job_status(raster_job_id)
else:
    print("‚ö†Ô∏è No job_id from previous cell")

---
## 5. Process Large Raster (100 MB - 30 GB)

Submit a large raster for tiled COG processing. Uses 5-stage workflow:
1. Generate tiling scheme
2. Extract tiles (sequential)
3. Create COGs (parallel)
4. Create MosaicJSON
5. Create STAC collection

### Size Limits (13 DEC 2025)

| Limit | Value | Behavior |
|-------|-------|----------|
| **Min file size** | 100 MB | Files <100MB should use `process_raster_v2` |
| **Max file size** | 30 GB | Files >30GB not supported |

### Test Data
- **antigua.tif** (11.16 GB) - Large Caribbean DEM, processes via tiling workflow

In [None]:
# Submit Large Raster via CoreMachine API (direct)
# Using antigua.tif (11.16 GB) - verified 13 DEC 2025
# Note: This is a long-running job (~30+ minutes for tiling and COG creation)

large_raster_request = {
    "blob_name": "antigua.tif",
    "container_name": "rmhazuregeobronze"
}

result = api_call("POST", "/api/jobs/submit/process_large_raster_v2", large_raster_request)
large_raster_job_id = result.get("job_id") if result else None

# Show size metadata from pre-flight validation
if result and "parameters" in result:
    params = result["parameters"]
    size_mb = params.get('_blob_size_mb', 0)
    print(f"\nüìè Pre-flight Size Check:")
    print(f"   File size: {size_mb:.2f} MB ({size_mb/1024:.2f} GB)")
    print(f"   Valid for large raster: {'‚úÖ' if 100 <= size_mb <= 30000 else '‚ùå'}")

print(f"\nüìã Job ID: {large_raster_job_id}")

In [None]:
# Check Large Raster Job Status
if large_raster_job_id:
    check_job_status(large_raster_job_id, max_polls=30, poll_interval=10)  # Longer timeout for large files
else:
    print("‚ö†Ô∏è No job_id from previous cell")

---
## 6. Process Raster Collection (Multi-File)

Submit multiple raster files to be processed as a collection with MosaicJSON.

### Size and Count Limits (13 DEC 2025)

| Limit | Value | Behavior |
|-------|-------|----------|
| **Max files per collection** | 20 | Collections with >20 files rejected |
| **Max individual file size** | 800 MB | Collections with ANY file >800MB rejected |
| **Min files per collection** | 2 | Single files should use `process_raster_v2` |

**Pre-flight validation order:**
1. **Collection count** - Rejected immediately if >20 files (before any blob checks)
2. **Individual file sizes** - Each blob checked in parallel; rejected if ANY exceeds 800MB
3. **File existence** - All blobs must exist in the container

### Size Metadata Captured
After validation, these fields are available in job parameters:
- `_blob_list_count` - Number of files
- `_blob_list_max_size_mb` - Largest file size
- `_blob_list_total_size_mb` - Total size of all files
- `_blob_list_largest_blob` - Name of largest file
- `_blob_list_has_large_raster` - True if any file >800MB

### Test Data
- **namangan/** folder (4 tiles, 1.6 GB total):
  - R1C1: 778 MB, R1C2: 704 MB, R2C1: 73 MB, R2C2: 65 MB
  - All under 800 MB limit ‚úÖ

In [None]:
# Submit Raster Collection via CoreMachine API (direct)
# Using namangan 4-tile collection (1.6 GB total) - verified 13 DEC 2025

collection_request = {
    "container_name": "rmhazuregeobronze",
    "blob_list": [
        "namangan/namangan14aug2019_R1C1cog.tif",  # 778 MB
        "namangan/namangan14aug2019_R1C2cog.tif",  # 704 MB
        "namangan/namangan14aug2019_R2C1cog.tif",  # 73 MB
        "namangan/namangan14aug2019_R2C2cog.tif"   # 65 MB
    ],
    "collection_id": "namangan-test"
}

result = api_call("POST", "/api/jobs/submit/process_raster_collection_v2", collection_request)
collection_job_id = result.get("job_id") if result else None

# Show size metadata from pre-flight validation
if result and "parameters" in result:
    params = result["parameters"]
    print(f"\nüìè Pre-flight Size Check:")
    print(f"   Files in collection: {params.get('_blob_list_count', 'N/A')}")
    print(f"   Largest file: {params.get('_blob_list_max_size_mb', 0):.2f} MB")
    print(f"   Total size: {params.get('_blob_list_total_size_mb', 0):.2f} MB")
    print(f"   Largest blob: {params.get('_blob_list_largest_blob', 'N/A')}")
    print(f"   Has large raster (>800MB): {'‚ùå Yes' if params.get('_blob_list_has_large_raster') else '‚úÖ No'}")

print(f"\nüìã Job ID: {collection_job_id}")

In [None]:
# Check Raster Collection Job Status
if collection_job_id:
    check_job_status(collection_job_id, max_polls=30, poll_interval=10)  # Longer timeout for multi-file
else:
    print("‚ö†Ô∏è No job_id from previous cell")

---
## Quick Reference: Manual Job Status Check

Use this cell to check any job by ID.

In [None]:
# Manual Job Status Check
# Replace with your job_id
manual_job_id = "YOUR_JOB_ID_HERE"

if manual_job_id != "YOUR_JOB_ID_HERE":
    check_job_status(manual_job_id)
else:
    print("‚ö†Ô∏è Replace 'YOUR_JOB_ID_HERE' with an actual job_id")

---
## 7. Rejection Examples (Size/Count Limit Violations)

These examples demonstrate the pre-flight validation rejecting invalid requests.

In [None]:
# Example 1: Single raster too large (>800 MB)
# antigua.tif is 11.16 GB - should be rejected with message to use process_large_raster_v2

print("=" * 60)
print("TEST 1: Single raster exceeding 800 MB limit")
print("=" * 60)

large_single_request = {
    "blob_name": "antigua.tif",  # 11.16 GB
    "container_name": "rmhazuregeobronze"
}

result = api_call("POST", "/api/jobs/submit/process_raster_v2", large_single_request)
if result and "error" in result:
    print(f"\n‚úÖ Correctly rejected: {result.get('message', '')[:100]}...")

In [None]:
# Example 2: Collection with too many files (>20)
# Should be rejected before any blob checks are made

print("=" * 60)
print("TEST 2: Collection exceeding 20 file limit")
print("=" * 60)

too_many_files_request = {
    "container_name": "rmhazuregeobronze",
    "blob_list": [f"file{i}.tif" for i in range(21)],  # 21 files
    "collection_id": "test-too-many"
}

result = api_call("POST", "/api/jobs/submit/process_raster_collection_v2", too_many_files_request)
if result and "error" in result:
    print(f"\n‚úÖ Correctly rejected: {result.get('message', '')[:100]}...")

In [None]:
# Example 3: Collection with missing blob
# Should be rejected with list of missing files

print("=" * 60)
print("TEST 3: Collection with non-existent file")
print("=" * 60)

missing_blob_request = {
    "container_name": "rmhazuregeobronze",
    "blob_list": [
        "namangan/namangan14aug2019_R1C1cog.tif",  # exists
        "nonexistent_file_xyz123.tif"               # does not exist
    ],
    "collection_id": "test-missing"
}

result = api_call("POST", "/api/jobs/submit/process_raster_collection_v2", missing_blob_request)
if result and "error" in result:
    print(f"\n‚úÖ Correctly rejected: {result.get('message', '')[:100]}...")