# Databricks Setup Guide for Customer.IO Integration

This notebook provides a complete walkthrough for setting up the Customer.IO API client library on a Databricks cluster. Follow these steps to get started with Customer.IO integration in your Databricks environment.

## Step 1: Library Installation

First, let's install the required libraries. You have three options:

In [None]:
# Option 1: Install from a wheel file (if you've built and uploaded one)
# %pip install /dbfs/libraries/customerio_api_client-1.0.0-py3-none-any.whl

# Option 2: Install individual requirements (recommended for development)
%pip install httpx>=0.25.0 pydantic>=2.0.0 pandas>=2.0.0 python-dotenv>=1.0.0 structlog>=24.0.0

# Option 3: Install from git repository (if accessible)
# %pip install git+https://github.com/your-org/customer_io_notebooks.git

## Step 2: Restart Python Kernel

After installing libraries, it's good practice to restart the Python kernel:

In [None]:
# Restart the Python kernel to ensure new libraries are loaded
dbutils.library.restartPython()

## Step 3: Verify Installation

Let's verify that the libraries were installed correctly:

In [None]:
# Check installed packages
import pkg_resources

required_packages = ['httpx', 'pydantic', 'pandas', 'python-dotenv', 'structlog']

for package in required_packages:
    try:
        version = pkg_resources.get_distribution(package).version
        print(f"✓ {package}: {version}")
    except:
        print(f"✗ {package}: NOT INSTALLED")

## Step 4: Set Up Databricks Secrets

For security, we'll use Databricks secrets to store API credentials. First, let's check if the secret scope exists:

In [None]:
# List available secret scopes
try:
    scopes = dbutils.secrets.listScopes()
    print("Available secret scopes:")
    for scope in scopes:
        print(f"  - {scope.name}")
    
    # Check if customer-io scope exists
    if any(scope.name == 'customer-io' for scope in scopes):
        print("\n✓ 'customer-io' scope exists")
    else:
        print("\n✗ 'customer-io' scope not found. Please create it using Databricks CLI:")
        print("  databricks secrets create-scope --scope customer-io")
except Exception as e:
    print(f"Error listing scopes: {e}")

In [None]:
# List secrets in the customer-io scope
try:
    secrets = dbutils.secrets.list("customer-io")
    print("Secrets in 'customer-io' scope:")
    for secret in secrets:
        print(f"  - {secret.key}")
    
    # Check for required secrets
    required_secrets = ['api_key', 'region']
    existing_keys = [s.key for s in secrets]
    
    for required in required_secrets:
        if required in existing_keys:
            print(f"\n✓ '{required}' secret exists")
        else:
            print(f"\n✗ '{required}' secret missing. Add it using:")
            print(f"  databricks secrets put --scope customer-io --key {required}")
            
except Exception as e:
    print(f"Error accessing secrets: {e}")
    print("\nTo create secrets, use the Databricks CLI:")
    print("  databricks secrets put --scope customer-io --key api_key")
    print("  databricks secrets put --scope customer-io --key region")

## Step 5: Test Module Imports

Now let's test importing the Customer.IO modules. We'll handle both packaged and development scenarios:

In [None]:
import os
import sys

# For development: Add src directory to path if it exists
# This is useful if you've uploaded the source code directly
if os.path.exists('/Workspace/customer_io_notebooks/src'):
    sys.path.insert(0, '/Workspace/customer_io_notebooks/src')
    print("Added src directory to Python path")

# Try importing the modules
try:
    from pipelines_api.api_client import CustomerIOClient
    from pipelines_api.people_manager import identify_user
    print("✓ Successfully imported Customer.IO modules")
except ImportError as e:
    print(f"✗ Import error: {e}")
    print("\nTroubleshooting steps:")
    print("1. Ensure you've installed the customerio-api-client package")
    print("2. Or upload the src/ directory to your workspace")
    print("3. Check the Python path:")
    print(f"   Current sys.path: {sys.path[:3]}...")

## Step 6: Test API Connection

Let's test the connection to Customer.IO API using the configured credentials:

In [None]:
# Load credentials from Databricks secrets
try:
    API_KEY = dbutils.secrets.get(scope="customer-io", key="api_key")
    REGION = dbutils.secrets.get(scope="customer-io", key="region")
    print("✓ Successfully loaded credentials from Databricks secrets")
    print(f"  Region: {REGION}")
except Exception as e:
    print(f"✗ Error loading secrets: {e}")
    print("\nFalling back to environment variables...")
    API_KEY = os.getenv('CUSTOMERIO_API_KEY', 'your-api-key-here')
    REGION = os.getenv('CUSTOMERIO_REGION', 'us')

In [None]:
# Initialize the Customer.IO client
try:
    client = CustomerIOClient(
        api_key=API_KEY,
        region=REGION
    )
    print("✓ Customer.IO client initialized successfully")
    print(f"  Base URL: {client.base_url}")
    print(f"  Rate limit: {client.rate_limit_requests} requests per {client.rate_limit_window} seconds")
except Exception as e:
    print(f"✗ Error initializing client: {e}")

In [None]:
# Test API connectivity with a simple operation
from datetime import datetime

test_user_id = f"databricks_test_{int(datetime.now().timestamp())}"
test_traits = {
    "email": "databricks@example.com",
    "name": "Databricks Test User",
    "source": "databricks_setup"
}

try:
    response = identify_user(client, test_user_id, test_traits)
    print("✓ API connection successful!")
    print(f"  Test user created: {test_user_id}")
except Exception as e:
    print(f"✗ API connection failed: {e}")
    print("\nPlease check:")
    print("1. Your API key is valid")
    print("2. The region setting is correct (us or eu)")
    print("3. Your network allows connections to Customer.IO")

## Step 7: Working with DBFS (Databricks File System)

If you need to work with files, here's how to use DBFS paths:

In [None]:
# Example: List files in DBFS
try:
    # List root DBFS directory
    files = dbutils.fs.ls('/')
    print("DBFS root contents:")
    for file in files[:5]:  # Show first 5 items
        print(f"  - {file.path} ({'dir' if file.isDir() else 'file'})")
    
    # Create a test directory for Customer.IO data
    test_dir = '/customer_io_data'
    dbutils.fs.mkdirs(test_dir)
    print(f"\n✓ Created directory: {test_dir}")
    
except Exception as e:
    print(f"Error with DBFS: {e}")

## Step 8: Verify Complete Setup

Let's run a final check to ensure everything is configured correctly:

In [None]:
# Setup verification checklist
print("Customer.IO Databricks Setup Checklist:")
print("=" * 40)

# Check 1: Libraries
try:
    import httpx
    import pydantic
    print("✓ Required libraries installed")
except:
    print("✗ Missing required libraries")

# Check 2: Secrets
try:
    _ = dbutils.secrets.get(scope="customer-io", key="api_key")
    _ = dbutils.secrets.get(scope="customer-io", key="region")
    print("✓ Databricks secrets configured")
except:
    print("✗ Databricks secrets not configured")

# Check 3: Module imports
try:
    from pipelines_api.api_client import CustomerIOClient
    print("✓ Customer.IO modules importable")
except:
    print("✗ Customer.IO modules not found")

# Check 4: API connectivity
try:
    if 'client' in locals():
        print("✓ API client initialized")
    else:
        print("✗ API client not initialized")
except:
    print("✗ API client check failed")

print("\n" + "=" * 40)
print("Setup complete! You can now run the Customer.IO notebooks.")

## Next Steps

Now that your Databricks environment is set up, you can:

1. **Run the example notebooks**:
   - Start with `pipelines_api/00_setup_and_configuration.ipynb`
   - Try `pipelines_api/01_people_management.ipynb` for user management
   - Explore other notebooks for different API features

2. **Create scheduled jobs** for regular data syncs

3. **Build Delta tables** to store Customer.IO data

4. **Integrate with your data pipeline**

## Troubleshooting

If you encounter issues:

- **Import errors**: Ensure the library is installed and Python kernel is restarted
- **Secret errors**: Verify the secret scope and keys exist
- **API errors**: Check your credentials and network connectivity
- **Path errors**: Use `/dbfs/` prefix for file operations

For more details, see the `DATABRICKS_SETUP.md` documentation.