# LifeArc POC - Demo 6: Programmatic Access & Authentication

This notebook demonstrates different authentication patterns for connecting to Snowflake programmatically.

**Use Cases:**
- ML pipelines accessing training data
- Automated ETL processes
- Service-to-service communication
- Integration with Azure ML Studio

## 1. Setup & Dependencies

In [None]:
# Install required packages
# !pip install snowflake-connector-python snowflake-snowpark-python cryptography pandas

In [None]:
import os
import json
from pathlib import Path

import snowflake.connector
from snowflake.snowpark import Session
import pandas as pd

# For key-pair authentication
from cryptography.hazmat.backends import default_backend
from cryptography.hazmat.primitives import serialization

## 2. Connection Configuration

Store your connection parameters securely. In production, use environment variables or a secrets manager.

In [None]:
# Configuration - replace with your values
# In production, load from environment variables or Azure Key Vault

SNOWFLAKE_CONFIG = {
    "account": os.getenv("SNOWFLAKE_ACCOUNT", "your_account_identifier"),
    "warehouse": os.getenv("SNOWFLAKE_WAREHOUSE", "DEMO_WH"),
    "database": os.getenv("SNOWFLAKE_DATABASE", "LIFEARC_POC"),
    "schema": os.getenv("SNOWFLAKE_SCHEMA", "UNSTRUCTURED_DATA"),
}

print(f"Target: {SNOWFLAKE_CONFIG['account']}/{SNOWFLAKE_CONFIG['database']}")

## 3. Authentication Method 1: Key-Pair Authentication

**Recommended for service accounts and automated pipelines.**

### Generate RSA Key Pair (run in terminal):
```bash
# Generate encrypted private key
openssl genrsa 2048 | openssl pkcs8 -topk8 -v2 des3 -inform PEM -out rsa_key.p8

# Generate public key
openssl rsa -in rsa_key.p8 -pubout -out rsa_key.pub

# View public key (for Snowflake)
cat rsa_key.pub
```

In [None]:
def load_private_key(key_path: str, passphrase: str = None) -> bytes:
    """
    Load and decode RSA private key for Snowflake authentication.
    
    Args:
        key_path: Path to the .p8 private key file
        passphrase: Passphrase for encrypted key (optional)
    
    Returns:
        Private key bytes in PKCS8 DER format
    """
    with open(key_path, "rb") as key_file:
        p_key = serialization.load_pem_private_key(
            key_file.read(),
            password=passphrase.encode() if passphrase else None,
            backend=default_backend()
        )
    
    # Convert to DER format for Snowflake
    pkb = p_key.private_bytes(
        encoding=serialization.Encoding.DER,
        format=serialization.PrivateFormat.PKCS8,
        encryption_algorithm=serialization.NoEncryption()
    )
    
    return pkb

print("Key loading function defined.")

In [None]:
def connect_with_keypair(
    user: str,
    account: str,
    private_key_path: str,
    passphrase: str = None,
    **kwargs
) -> snowflake.connector.SnowflakeConnection:
    """
    Connect to Snowflake using key-pair authentication.
    
    Args:
        user: Snowflake username (service account)
        account: Snowflake account identifier
        private_key_path: Path to RSA private key file
        passphrase: Key passphrase (if encrypted)
        **kwargs: Additional connection parameters
    
    Returns:
        Active Snowflake connection
    """
    private_key = load_private_key(private_key_path, passphrase)
    
    conn = snowflake.connector.connect(
        user=user,
        account=account,
        private_key=private_key,
        **kwargs
    )
    
    print(f"✓ Connected as {user} using key-pair authentication")
    return conn

# Example usage (uncomment when you have a key file):
# conn = connect_with_keypair(
#     user="LIFEARC_ML_SERVICE",
#     account=SNOWFLAKE_CONFIG["account"],
#     private_key_path="./rsa_key.p8",
#     passphrase="your_passphrase",
#     warehouse=SNOWFLAKE_CONFIG["warehouse"],
#     database=SNOWFLAKE_CONFIG["database"],
#     schema=SNOWFLAKE_CONFIG["schema"]
# )

## 4. Authentication Method 2: Snowpark Session with Key-Pair

**Best for data science workloads using DataFrame API.**

In [None]:
def create_snowpark_session_keypair(
    user: str,
    account: str,
    private_key_path: str,
    passphrase: str = None,
    **kwargs
) -> Session:
    """
    Create Snowpark session with key-pair authentication.
    
    Returns:
        Active Snowpark Session
    """
    private_key = load_private_key(private_key_path, passphrase)
    
    connection_parameters = {
        "account": account,
        "user": user,
        "private_key": private_key,
        **kwargs
    }
    
    session = Session.builder.configs(connection_parameters).create()
    print(f"✓ Snowpark session created for {user}")
    return session

# Example usage:
# session = create_snowpark_session_keypair(
#     user="LIFEARC_ML_SERVICE",
#     account=SNOWFLAKE_CONFIG["account"],
#     private_key_path="./rsa_key.p8",
#     passphrase="your_passphrase",
#     warehouse=SNOWFLAKE_CONFIG["warehouse"],
#     database=SNOWFLAKE_CONFIG["database"],
#     schema=SNOWFLAKE_CONFIG["schema"]
# )

## 5. Authentication Method 3: External Browser (Interactive)

**For interactive development with SSO/Azure AD.**

In [None]:
def connect_with_browser_sso(
    user: str,
    account: str,
    **kwargs
) -> snowflake.connector.SnowflakeConnection:
    """
    Connect using browser-based SSO authentication.
    Opens browser for Azure AD / Okta / other IdP login.
    
    Ideal for interactive data science work.
    """
    conn = snowflake.connector.connect(
        user=user,
        account=account,
        authenticator='externalbrowser',
        **kwargs
    )
    
    print(f"✓ Connected as {user} via browser SSO")
    return conn

# Example:
# conn = connect_with_browser_sso(
#     user="your.email@lifearc.org",
#     account=SNOWFLAKE_CONFIG["account"],
#     warehouse=SNOWFLAKE_CONFIG["warehouse"]
# )

## 6. Authentication Method 4: OAuth Token

**For application-to-Snowflake integration with token refresh.**

In [None]:
def connect_with_oauth_token(
    user: str,
    account: str,
    oauth_token: str,
    **kwargs
) -> snowflake.connector.SnowflakeConnection:
    """
    Connect using OAuth access token.
    
    Token should be obtained from your IdP (Azure AD, Okta, etc.)
    using the appropriate OAuth flow.
    """
    conn = snowflake.connector.connect(
        user=user,
        account=account,
        authenticator='oauth',
        token=oauth_token,
        **kwargs
    )
    
    print(f"✓ Connected as {user} via OAuth token")
    return conn

# Example OAuth token retrieval (pseudo-code)
def get_azure_ad_token(client_id: str, client_secret: str, tenant_id: str) -> str:
    """Get OAuth token from Azure AD for Snowflake."""
    import requests
    
    token_url = f"https://login.microsoftonline.com/{tenant_id}/oauth2/v2.0/token"
    
    response = requests.post(token_url, data={
        'grant_type': 'client_credentials',
        'client_id': client_id,
        'client_secret': client_secret,
        'scope': f'https://{SNOWFLAKE_CONFIG["account"]}.snowflakecomputing.com/.default'
    })
    
    return response.json()['access_token']

## 7. Connection Helper Class

**Production-ready connection manager with multiple auth methods.**

In [None]:
class SnowflakeConnectionManager:
    """
    Unified connection manager supporting multiple authentication methods.
    
    Usage:
        manager = SnowflakeConnectionManager(account, warehouse, database)
        
        # For service accounts
        conn = manager.connect_keypair(user, key_path, passphrase)
        
        # For interactive use
        conn = manager.connect_sso(user)
    """
    
    def __init__(
        self,
        account: str,
        warehouse: str = None,
        database: str = None,
        schema: str = None
    ):
        self.account = account
        self.warehouse = warehouse
        self.database = database
        self.schema = schema
        self._connection = None
        self._session = None
    
    def _get_common_params(self) -> dict:
        """Get common connection parameters."""
        params = {'account': self.account}
        if self.warehouse:
            params['warehouse'] = self.warehouse
        if self.database:
            params['database'] = self.database
        if self.schema:
            params['schema'] = self.schema
        return params
    
    def connect_keypair(
        self,
        user: str,
        private_key_path: str,
        passphrase: str = None
    ) -> snowflake.connector.SnowflakeConnection:
        """Connect using RSA key-pair authentication."""
        private_key = load_private_key(private_key_path, passphrase)
        
        params = self._get_common_params()
        params.update({
            'user': user,
            'private_key': private_key
        })
        
        self._connection = snowflake.connector.connect(**params)
        print(f"✓ Connected as {user} (key-pair)")
        return self._connection
    
    def connect_sso(self, user: str) -> snowflake.connector.SnowflakeConnection:
        """Connect using browser-based SSO."""
        params = self._get_common_params()
        params.update({
            'user': user,
            'authenticator': 'externalbrowser'
        })
        
        self._connection = snowflake.connector.connect(**params)
        print(f"✓ Connected as {user} (SSO)")
        return self._connection
    
    def create_snowpark_session(
        self,
        user: str,
        private_key_path: str = None,
        passphrase: str = None
    ) -> Session:
        """Create Snowpark session."""
        params = self._get_common_params()
        params['user'] = user
        
        if private_key_path:
            params['private_key'] = load_private_key(private_key_path, passphrase)
        else:
            params['authenticator'] = 'externalbrowser'
        
        self._session = Session.builder.configs(params).create()
        print(f"✓ Snowpark session created for {user}")
        return self._session
    
    def execute_query(self, sql: str) -> pd.DataFrame:
        """Execute SQL and return DataFrame."""
        if not self._connection:
            raise RuntimeError("No active connection. Call connect_* first.")
        
        cursor = self._connection.cursor()
        cursor.execute(sql)
        return cursor.fetch_pandas_all()
    
    def close(self):
        """Close all connections."""
        if self._connection:
            self._connection.close()
            print("✓ Connection closed")
        if self._session:
            self._session.close()
            print("✓ Session closed")

print("SnowflakeConnectionManager class defined.")

## 8. Example: ML Pipeline Data Access

**Demonstrates typical ML pipeline pattern for accessing training data.**

In [None]:
def fetch_training_data(
    session: Session,
    table_name: str,
    sample_size: int = None
) -> pd.DataFrame:
    """
    Fetch training data from Snowflake for ML model.
    
    Args:
        session: Active Snowpark session
        table_name: Source table name
        sample_size: Optional row limit for sampling
    
    Returns:
        Pandas DataFrame with training data
    """
    df = session.table(table_name)
    
    if sample_size:
        df = df.limit(sample_size)
    
    # Convert to pandas for ML frameworks
    pdf = df.to_pandas()
    
    print(f"✓ Fetched {len(pdf)} rows from {table_name}")
    return pdf

# Example usage:
# training_data = fetch_training_data(
#     session=session,
#     table_name="CLINICAL_TRIAL_RESULTS",
#     sample_size=1000
# )

## 9. Best Practices Summary

### Authentication Method Selection

| Use Case | Recommended Method | Why |
|----------|-------------------|-----|
| Automated ML pipelines | Key-Pair | No passwords, rotatable, auditable |
| ETL/Data Integration | Key-Pair | Same as above |
| Interactive Data Science | SSO/Browser | Uses existing Azure AD identity |
| Web Applications | OAuth | Token refresh, scoped access |
| Azure ML Studio | External OAuth | Integrates with Azure AD |

### Security Checklist

- [ ] Use dedicated service accounts per application
- [ ] Store private keys in Azure Key Vault (not in code)
- [ ] Implement key rotation procedures
- [ ] Apply network policies to service accounts
- [ ] Monitor login history for anomalies
- [ ] Use least-privilege role grants
- [ ] Enable MFA for interactive users

## 10. Environment Variables Template

Create a `.env` file (never commit to git):

```bash
# Snowflake Configuration
SNOWFLAKE_ACCOUNT=your_account_identifier
SNOWFLAKE_WAREHOUSE=DEMO_WH
SNOWFLAKE_DATABASE=LIFEARC_POC
SNOWFLAKE_SCHEMA=UNSTRUCTURED_DATA

# Service Account (ML Pipeline)
SNOWFLAKE_ML_USER=LIFEARC_ML_SERVICE
SNOWFLAKE_ML_PRIVATE_KEY_PATH=/path/to/rsa_key.p8
SNOWFLAKE_ML_KEY_PASSPHRASE=your_passphrase

# Azure AD OAuth (if using)
AZURE_TENANT_ID=your_tenant_id
AZURE_CLIENT_ID=your_client_id
AZURE_CLIENT_SECRET=your_client_secret
```