# Snowflake Container Runtime for Distributed ML

This notebook sets up **true distributed training** using Snowflake's **Container Runtime** with compute pools - **no Docker builds required!**

## Simplified Infrastructure (Container Runtime):
1. **Compute Pools** - Multi-node clusters for distributed training
2. **Pre-built Images** - Use Anaconda/public ML images via Container Runtime
3. **Distributed Training** - Multi-node XGBoost, Ray ML, and more
4. **Resource Management** - Auto-scaling and load balancing
5. **Integration** - Seamless connection to Snowflake ML Registry

## Prerequisites:
- SPCS enabled in your Snowflake account
- ACCOUNTADMIN privileges for compute pool creation
- **No Docker needed** - uses Container Runtime!

## Container Runtime Benefits:
- **No manual Docker builds**
- **Pre-built ML images** from Anaconda
- **Snowflake UI configuration**
- **Automatic dependency management**


In [None]:
# Environment Setup for SPCS
import sys
import os

# Fix path for snowflake_connection module
current_dir = os.getcwd()
if "notebooks" in current_dir:
    src_path = os.path.join(current_dir, "..", "src")
else:
    src_path = os.path.join(current_dir, "src")

sys.path.append(src_path)
print(f"Added to Python path: {src_path}")

from snowflake_connection import get_session

# Get Snowflake session with admin privileges
session = get_session()
print("SUCCESS: Snowflake connection established for SPCS setup")
print("Ready to configure distributed training infrastructure")


In [None]:
# 1. Create Compute Pools for Distributed Training
print("Setting up compute pools for distributed ML training...")

# Create GPU-enabled compute pool for intensive ML workloads
gpu_pool_sql = """
CREATE COMPUTE POOL IF NOT EXISTS ML_DISTRIBUTED_GPU_POOL
MIN_NODES = 1
MAX_NODES = 8
INSTANCE_FAMILY = GPU_NV_S
AUTO_RESUME = TRUE
AUTO_SUSPEND_SECS = 300
COMMENT = 'GPU compute pool for distributed ML training with Ray/XGBoost'
"""

# Create CPU compute pool for general distributed training
cpu_pool_sql = """
CREATE COMPUTE POOL IF NOT EXISTS ML_DISTRIBUTED_CPU_POOL
MIN_NODES = 2
MAX_NODES = 16
INSTANCE_FAMILY = CPU_X64_S
AUTO_RESUME = TRUE
AUTO_SUSPEND_SECS = 300
COMMENT = 'CPU compute pool for distributed ML training and data processing'
"""

try:
    # Execute compute pool creation
    session.sql(gpu_pool_sql).collect()
    print("SUCCESS: GPU compute pool created: ML_DISTRIBUTED_GPU_POOL")
    
    session.sql(cpu_pool_sql).collect()
    print("SUCCESS: CPU compute pool created: ML_DISTRIBUTED_CPU_POOL")
    
    # List compute pools
    pools = session.sql("SHOW COMPUTE POOLS").collect()
    print(f"\nAvailable compute pools ({len(pools)} total):")
    for pool in pools:
        if 'ML_DISTRIBUTED' in pool['name']:
            print(f"   - {pool['name']} - {pool['instance_family']} ({pool['state']})")
            
except Exception as e:
    print(f"WARNING: Compute pool setup error: {e}")
    print("Note: SPCS requires ACCOUNTADMIN privileges and may need to be enabled")


In [None]:
# 3. Verify Compute Pool Setup for Distributed Training
print("Compute pools are ready for distributed ML training!")

# Check compute pool status
try:
    pools = session.sql("SHOW COMPUTE POOLS").collect()
    print("\nAvailable compute pools for distributed training:")
    
    ml_pools = []
    for pool in pools:
        if 'ML_DISTRIBUTED' in pool['name']:
            ml_pools.append(pool)
            print(f"   - {pool['name']}")
            print(f"      Instance Family: {pool['instance_family']}")
            print(f"      State: {pool['state']}")
            print(f"      Min/Max Nodes: {pool['min_nodes']}/{pool['max_nodes']}")
            print()
    
    if len(ml_pools) >= 1:
        print("SUCCESS: Compute pools ready for distributed training!")
        print("Snowflake ML APIs will automatically use these pools")
        print("No additional container setup needed")
    else:
        print("WARNING: No ML compute pools found - check previous cell")
        
except Exception as e:
    print(f"WARNING: Error checking compute pools: {e}")

print("\nReady for Distributed Training!")
print("What's set up:")
print("   - Multi-node compute pools (CPU + GPU)")
print("   - Auto-scaling from 1-16 nodes")  
print("   - Direct integration with Snowflake ML APIs")
print("   - Native distributed XGBoost support")
print("\nNext: Run 05b_True_Distributed_Training.ipynb")


## Distributed ML Setup Complete!

### What We Set Up (Simple & Clean):

1. **Compute Pools Only**
   - `ML_DISTRIBUTED_GPU_POOL` - GPU instances for intensive ML
   - `ML_DISTRIBUTED_CPU_POOL` - CPU cluster for distributed training
   - **Auto-scaling** from 1-16 nodes as needed

2. **Native Snowflake ML Integration**
   - **No containers needed** - Snowflake ML APIs use compute pools directly
   - **No Docker complexity** - everything works natively
   - **Built-in distributed training** - XGBoost, scikit-learn, etc.

### Why This Approach Works Better:

- **Simplest setup** - just compute pools
- **Native ML APIs** - Snowflake handles distribution automatically  
- **No container management** - focus on ML, not infrastructure
- **Enterprise ready** - integrated with ML Registry and Feature Store

### Next Step: Distributed Training

**Ready for notebook `05b_True_Distributed_Training.ipynb`!**

The Snowflake ML APIs will automatically:
- **Use the compute pools** for distributed training
- **Scale across multiple nodes** (2-16 nodes)
- **Handle data distribution** automatically
- **Integrate results** back to ML Registry

### Verify Setup:
- Check **Snowflake UI > Admin > Compute Pools**
- Look for `ML_DISTRIBUTED_CPU_POOL` and `ML_DISTRIBUTED_GPU_POOL`
- Status should show "ACTIVE" or "IDLE"
