# Snowflake Functions & Procedures
## Practical Design Guide

| Section | Topic |
|---------|-------|
| 1 | Inline vs. Staged Handler Code |
| 2 | Snowflake-Imposed Constraints |
| 3 | Naming and Overloading |
| 4 | Defining Arguments |
| 5 | Data Type Mappings |
| 6 | Managing Dependencies |

## Setup

In [None]:
from snowflake.snowpark.context import get_active_session

session = get_active_session()
print(f"Connected as: {session.get_current_user()}")

In [None]:
db = session.get_current_database()
session.sql(f"CREATE SCHEMA IF NOT EXISTS {db}.FUNC_PROC_DEMO").collect()
session.sql(f"USE SCHEMA {db}.FUNC_PROC_DEMO").collect()
session.sql("CREATE STAGE IF NOT EXISTS CODE_STAGE").collect()
print(f"Using {db}.FUNC_PROC_DEMO")

---
## Section 1: Inline vs. Staged Handler Code

| Approach | Best For | Pros | Cons |
|----------|----------|------|------|
| **Inline** | Simple, short functions | Easy to deploy, self-contained | Hard to test, no version control |
| **Staged** | Complex logic, shared code | Reusable, testable, version controlled | Extra deployment step |

### Example 1a: Inline Python UDF

In [None]:
session.sql("""
CREATE OR REPLACE FUNCTION mask_email_inline(email VARCHAR)
RETURNS VARCHAR
LANGUAGE PYTHON
RUNTIME_VERSION = '3.11'
HANDLER = 'mask_email'
AS $$
def mask_email(email):
    if not email or '@' not in email:
        return email
    local, domain = email.split('@', 1)
    if len(local) <= 2:
        masked_local = '*' * len(local)
    else:
        masked_local = local[0] + '*' * (len(local) - 2) + local[-1]
    return f"{masked_local}@{domain}"
$$
""").collect()
print("Created inline UDF: mask_email_inline")

In [None]:
session.sql("""
SELECT 
    'sarah.johnson@email.com' as original,
    mask_email_inline('sarah.johnson@email.com') as masked
""").to_pandas()

### Example 1b: Staged Python UDF

For complex or reusable code, store the handler in a stage file.

In [None]:
handler_code = '''
import re

def mask_phone(phone):
    """Mask phone number, keeping only last 4 digits."""
    if not phone:
        return phone
    digits = re.sub(r"\\D", "", phone)
    if len(digits) < 4:
        return "*" * len(digits)
    return "***-***-" + digits[-4:]

def validate_email(email):
    """Check if email format is valid."""
    if not email:
        return False
    pattern = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
    return bool(re.match(pattern, email))
'''

with open('/tmp/data_utils.py', 'w') as f:
    f.write(handler_code)

session.file.put('/tmp/data_utils.py', '@CODE_STAGE', auto_compress=False, overwrite=True)
print("Uploaded handler to @CODE_STAGE/data_utils.py")

In [None]:
session.sql("""
CREATE OR REPLACE FUNCTION mask_phone_staged(phone VARCHAR)
RETURNS VARCHAR
LANGUAGE PYTHON
RUNTIME_VERSION = '3.11'
IMPORTS = ('@CODE_STAGE/data_utils.py')
HANDLER = 'data_utils.mask_phone'
""").collect()

session.sql("""
CREATE OR REPLACE FUNCTION validate_email_staged(email VARCHAR)
RETURNS BOOLEAN
LANGUAGE PYTHON
RUNTIME_VERSION = '3.11'
IMPORTS = ('@CODE_STAGE/data_utils.py')
HANDLER = 'data_utils.validate_email'
""").collect()

print("Created staged UDFs: mask_phone_staged, validate_email_staged")

In [None]:
session.sql("""
SELECT 
    '555-123-4567' as original_phone,
    mask_phone_staged('555-123-4567') as masked_phone,
    'test@example.com' as email,
    validate_email_staged('test@example.com') as is_valid,
    validate_email_staged('invalid-email') as is_invalid
""").to_pandas()

---
## Section 2: Snowflake-Imposed Constraints

| Constraint | Limit | Why It Matters |
|------------|-------|----------------|
| **Memory** | ~2-4 GB per warehouse node | Large data structures may fail |
| **Execution Time** | Varies by warehouse size | Long-running ops may timeout |
| **Network Access** | Blocked by default | Must use External Access Integration |
| **File System** | Read-only (except /tmp) | Can only write to /tmp |
| **Packages** | Anaconda channel only | Custom packages need special handling |

### Example 2a: Using /tmp for Temporary Files

In [None]:
session.sql("""
CREATE OR REPLACE PROCEDURE process_with_temp_file(input_text VARCHAR)
RETURNS VARCHAR
LANGUAGE PYTHON
RUNTIME_VERSION = '3.11'
PACKAGES = ('snowflake-snowpark-python')
HANDLER = 'process_text'
AS $$
import os
import tempfile

def process_text(session, input_text):
    with tempfile.NamedTemporaryFile(mode='w', suffix='.txt', delete=False, dir='/tmp') as f:
        f.write(input_text)
        temp_path = f.name
    
    with open(temp_path, 'r') as f:
        content = f.read()
    
    os.remove(temp_path)
    
    return f"Processed {len(content)} characters from temp file"
$$
""").collect()
print("Created procedure: process_with_temp_file")

In [None]:
session.sql("CALL process_with_temp_file('Hello from Understood!')").to_pandas()

---
## Section 3: Naming and Overloading

### Rules
- Names are **case-insensitive** unless quoted
- Use **snake_case** for consistency
- Same name + different argument types = **overloading**

### Example 3: Overloaded Functions

In [None]:
session.sql("CREATE OR REPLACE FUNCTION format_date_display(d DATE) RETURNS VARCHAR LANGUAGE SQL AS $$ TO_CHAR(d, 'Month DD, YYYY') $$").collect()
session.sql("CREATE OR REPLACE FUNCTION format_date_display(ts TIMESTAMP) RETURNS VARCHAR LANGUAGE SQL AS $$ TO_CHAR(ts, 'Month DD, YYYY HH12:MI AM') $$").collect()
session.sql("CREATE OR REPLACE FUNCTION format_date_display(date_str VARCHAR) RETURNS VARCHAR LANGUAGE SQL AS $$ TO_CHAR(TRY_TO_DATE(date_str), 'Month DD, YYYY') $$").collect()
print("Created 3 overloaded functions: format_date_display(DATE/TIMESTAMP/VARCHAR)")

In [None]:
session.sql("""
SELECT 
    format_date_display(CURRENT_DATE()) as from_date,
    format_date_display(CURRENT_TIMESTAMP()) as from_timestamp,
    format_date_display('2025-12-25') as from_string
""").to_pandas()

---
## Section 4: Defining Arguments

| Feature | Syntax | Example |
|---------|--------|---------|
| **Required** | `arg_name TYPE` | `name VARCHAR` |
| **Default** | `arg_name TYPE DEFAULT value` | `limit INT DEFAULT 10` |
| **Named Call** | `func(arg_name => value)` | `func(limit => 5)` |

### Example 4: Default Arguments

In [None]:
session.sql("""
CREATE OR REPLACE FUNCTION truncate_text(
    text VARCHAR,
    max_length INT DEFAULT 100,
    suffix VARCHAR DEFAULT '...'
)
RETURNS VARCHAR
LANGUAGE SQL
AS $$
    CASE 
        WHEN LENGTH(text) <= max_length THEN text
        ELSE LEFT(text, max_length - LENGTH(suffix)) || suffix
    END
$$
""").collect()
print("Created function with default arguments")

In [None]:
session.sql("""
SELECT 
    truncate_text('This is a long text that will be truncated based on defaults.') as with_defaults,
    truncate_text('This is a long text that will be truncated.', 30) as custom_length,
    truncate_text('This is a long text.', 15, ' [more]') as custom_suffix
""").to_pandas()

---
## Section 5: Data Type Mappings

| SQL Type | Python Type |
|----------|-------------|
| `VARCHAR` | `str` |
| `NUMBER/INT` | `int` or `float` |
| `BOOLEAN` | `bool` |
| `DATE` | `datetime.date` |
| `VARIANT` | `dict` or `list` |
| `ARRAY` | `list` |

### Example 5a: Working with VARIANT (JSON)

In [None]:
session.sql("""
CREATE OR REPLACE FUNCTION parse_student_profile(profile VARIANT)
RETURNS VARIANT
LANGUAGE PYTHON
RUNTIME_VERSION = '3.11'
HANDLER = 'parse_profile'
AS $$
def parse_profile(profile):
    if not profile:
        return None
    
    result = {
        'name': profile.get('name', 'Unknown'),
        'learning_differences': profile.get('learning_differences', []),
        'difference_count': len(profile.get('learning_differences', [])),
        'risk_level': 'high' if len(profile.get('learning_differences', [])) > 2 else 'standard'
    }
    return result
$$
""").collect()
print("Created VARIANT-handling UDF")

In [None]:
session.sql("""
SELECT parse_student_profile(
    PARSE_JSON('{"name": "Emma", "learning_differences": ["dyslexia", "ADHD", "dyscalculia"]}')
) as enriched_profile
""").to_pandas()

### Example 5b: Working with ARRAY

In [None]:
session.sql("""
CREATE OR REPLACE FUNCTION analyze_scores(scores ARRAY)
RETURNS VARIANT
LANGUAGE PYTHON
RUNTIME_VERSION = '3.11'
HANDLER = 'analyze'
AS $$
def analyze(scores):
    if not scores:
        return {'error': 'No scores provided'}
    
    numeric_scores = [s for s in scores if isinstance(s, (int, float))]
    
    return {
        'count': len(numeric_scores),
        'min': min(numeric_scores),
        'max': max(numeric_scores),
        'average': round(sum(numeric_scores) / len(numeric_scores), 2),
        'passing': len([s for s in numeric_scores if s >= 70])
    }
$$
""").collect()
print("Created ARRAY-handling UDF")

In [None]:
session.sql("SELECT analyze_scores(ARRAY_CONSTRUCT(85, 92, 67, 78, 91, 55, 88)) as score_analysis").to_pandas()

---
## Section 6: Managing Dependencies

| Method | Use Case | Example |
|--------|----------|---------|
| **PACKAGES** | Anaconda packages | `PACKAGES = ('pandas', 'numpy')` |
| **IMPORTS** | Staged .py files | `IMPORTS = ('@stage/utils.py')` |

### Example 6a: Using PACKAGES (Anaconda)

In [None]:
session.sql("""
CREATE OR REPLACE FUNCTION calculate_statistics(input_values ARRAY)
RETURNS VARIANT
LANGUAGE PYTHON
RUNTIME_VERSION = '3.11'
PACKAGES = ('numpy')
HANDLER = 'calc_stats'
AS $$
import numpy as np

def calc_stats(input_values):
    if not input_values:
        return None
    
    arr = np.array([v for v in input_values if v is not None], dtype=float)
    
    return {
        'mean': float(np.mean(arr)),
        'median': float(np.median(arr)),
        'std': float(np.std(arr)),
        'percentile_25': float(np.percentile(arr, 25)),
        'percentile_75': float(np.percentile(arr, 75))
    }
$$
""").collect()
print("Created UDF with numpy package")

In [None]:
session.sql("SELECT calculate_statistics(ARRAY_CONSTRUCT(10, 20, 30, 40, 50, 60, 70, 80, 90, 100)) as stats").to_pandas()

### Example 6b: Multiple IMPORTS from Stage

In [None]:
validators_code = '''import re

def is_valid_phone(phone):
    if not phone:
        return False
    digits = re.sub(r"\\D", "", phone)
    return len(digits) == 10 or len(digits) == 11

def is_valid_zip(zipcode):
    if not zipcode:
        return False
    return bool(re.match(r"^\\d{5}(-\\d{4})?$", str(zipcode)))
'''

formatters_code = '''def format_phone(phone):
    import re
    if not phone:
        return phone
    digits = re.sub(r"\\D", "", phone)
    if len(digits) == 10:
        return f"({digits[:3]}) {digits[3:6]}-{digits[6:]}"
    return phone
'''

with open('/tmp/validators.py', 'w') as f:
    f.write(validators_code)
with open('/tmp/formatters.py', 'w') as f:
    f.write(formatters_code)

session.file.put('/tmp/validators.py', '@CODE_STAGE', auto_compress=False, overwrite=True)
session.file.put('/tmp/formatters.py', '@CODE_STAGE', auto_compress=False, overwrite=True)
print("Uploaded validators.py and formatters.py to stage")

In [None]:
session.sql("""
CREATE OR REPLACE FUNCTION clean_contact_info(phone VARCHAR, zipcode VARCHAR)
RETURNS VARIANT
LANGUAGE PYTHON
RUNTIME_VERSION = '3.11'
IMPORTS = ('@CODE_STAGE/validators.py', '@CODE_STAGE/formatters.py')
HANDLER = 'clean_contact'
AS $$
import validators
import formatters

def clean_contact(phone, zipcode):
    return {
        'phone': {
            'original': phone,
            'formatted': formatters.format_phone(phone),
            'is_valid': validators.is_valid_phone(phone)
        },
        'zipcode': {
            'original': zipcode,
            'is_valid': validators.is_valid_zip(zipcode)
        }
    }
$$
""").collect()
print("Created UDF with multiple imports")

In [None]:
session.sql("SELECT clean_contact_info('5551234567', '12345') as valid_contact").to_pandas()

---
## Summary: Best Practices Checklist

| Scenario | Recommendation |
|----------|----------------|
| Simple transformation | Inline SQL UDF |
| Complex logic < 50 lines | Inline Python UDF |
| Reusable business logic | Staged Python files |
| Need external packages | Use PACKAGES clause |
| Multiple related functions | Single staged .py file |

**Key Reminders:**
- Use **snake_case** naming
- Add **default values** for optional parameters  
- Handle **NULL inputs** gracefully
- Keep handlers **stateless**
- Use **/tmp** for any file operations

---
## Cleanup (Optional)

In [None]:
# Uncomment to clean up
# db = session.get_current_database()
# session.sql(f"DROP SCHEMA IF EXISTS {db}.FUNC_PROC_DEMO CASCADE").collect()
# print("Cleaned up demo objects")