<div style="display: flex; justify-content: space-between; align-items: center; padding: 8px 16px; background: #F8F9FA; border-bottom: 2px solid #E0E0E0; margin: 0; line-height: 1;">
    <div style="font-size: 14px; color: #666;">
        <span style="font-weight: bold; color: #333;">{SOURCE_PLATFORM} → Databricks Migration</span>
        <span style="margin-left: 8px; color: #999;">|</span>
        <span style="margin-left: 8px;">05 - Enable</span>
    </div>
    <div style="display: flex; align-items: center; gap: 8px;">
        <img src="https://cdn.simpleicons.org/snowflake/29B5E8" width="24" height="24"/>
        <span style="color: #999; font-size: 16px;">→</span>
        <img src="https://cdn.simpleicons.org/databricks/FF3621" width="24" height="24"/>
    </div>
</div>

<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img
    src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png"
    alt="Databricks Learning"
  >
</div>

# Consumer Integration

## Overview

You've successfully migrated data and workloads to Databricks. Now comes the critical phase: **operationalizing the platform for ongoing use at scale**. This module focuses on integrating the downstream consumers of your data and AI assets - from BI analysts querying dashboards to ML engineers deploying models to production.

**Key Distinction:** Earlier modules *built* things; this module is about *operationalizing* and *scaling* them for ongoing use.

Consumer integration encompasses:
- **BI tool connectivity** - Connecting Tableau, Power BI, Looker, and other BI platforms to SQL Warehouses
- **Dashboard migration** - Porting dashboards from {SOURCE_PLATFORM} to Databricks SQL or connected BI tools
- **ML pipeline integration** - Integrating model training, feature engineering, and serving with Unity Catalog
- **API and SDK access** - Enabling programmatic access for applications and downstream systems

## Learning Objectives

By the end of this lesson, you will be able to:
- Configure BI tool connections to Databricks SQL Warehouses
- Migrate and validate dashboards with SLA requirements
- Integrate ML pipelines with Unity Catalog governance
- Implement API and SDK patterns for downstream consumers
- Establish monitoring and SLA validation for consumer workloads

<div style="border-left: 4px solid #1976d2; background: #e3f2fd; padding: 16px 20px; border-radius: 4px; margin: 16px 0;">
    <div style="display: flex; align-items: flex-start; gap: 12px;">
        <span style="font-size: 24px;">ℹ️</span>
        <div>
            <strong style="color: #0d47a1; font-size: 1.1em;">Consumer Integration is about Scale and Operationalization</strong>
            <p style="margin: 8px 0 0 0; color: #333;">
                This phase is not about proving the platform works - you've already done that in <strong>Activate</strong>. It's about enabling hundreds or thousands of users to consume data and AI assets reliably, with predictable performance and governance. Focus on <strong>self-service, monitoring, and SLA management</strong>.
            </p>
        </div>
    </div>
</div>

## Consumer Integration Architecture

Understanding the consumer landscape helps you plan integration priorities and resource allocation.

<br />

<div class="mermaid">
graph TB
    subgraph "Data & AI Platform"
        UC[Unity Catalog<br/>Governance Layer]
        SQLW[SQL Warehouses<br/>BI & Analytics]
        DLT[Delta Live Tables<br/>Pipelines]
        FS[Feature Store<br/>ML Features]
        MR[Model Registry<br/>ML Models]
        MS[Model Serving<br/>Inference APIs]
    end

    subgraph "BI Consumers"
        Tableau[Tableau]
        PowerBI[Power BI]
        Looker[Looker]
        DBSQL[Databricks SQL<br/>Native Dashboards]
    end

    subgraph "ML Consumers"
        Training[Model Training<br/>Notebooks/Jobs]
        Serving[Application<br/>Integration]
        Batch[Batch Inference<br/>Pipelines]
    end

    subgraph "Application Consumers"
        REST[REST APIs<br/>Applications]
        SDK[Python/R SDK<br/>Custom Tools]
        JDBC[JDBC/ODBC<br/>Legacy Apps]
    end

    UC --> SQLW
    UC --> FS
    UC --> MR

    SQLW --> Tableau
    SQLW --> PowerBI
    SQLW --> Looker
    SQLW --> DBSQL

    FS --> Training
    MR --> MS
    MS --> Serving
    DLT --> Batch

    SQLW --> REST
    SQLW --> SDK
    SQLW --> JDBC

    style UC fill:#FF3621,stroke:#333,stroke-width:2px,color:#fff
    style SQLW fill:#1b3b6f,stroke:#333,stroke-width:2px,color:#fff
    style FS fill:#1b3b6f,stroke:#333,stroke-width:2px,color:#fff
    style MR fill:#1b3b6f,stroke:#333,stroke-width:2px,color:#fff
</div>

<script type="module">
  import mermaid from 'https://cdn.jsdelivr.net/npm/mermaid@10/dist/mermaid.esm.min.mjs';
  mermaid.initialize({ startOnLoad: true, theme: 'default' });
</script>

## Consumer Segmentation and Prioritization

Not all consumers are equal. Segment by business criticality, usage patterns, and migration complexity to prioritize integration efforts.

| Consumer Type | Characteristics | Priority | Migration Complexity |
|--------------|-----------------|----------|---------------------|
| **Executive Dashboards** | Low query volume, high visibility, strict SLAs | **High** | Medium - require validation |
| **Operational BI** | High concurrency, ad-hoc queries, self-service | **High** | Low - standard connectors |
| **Production ML Models** | Real-time inference, strict latency SLAs | **Critical** | High - requires validation |
| **Batch Analytics** | Scheduled reports, predictable load | Medium | Low - scheduled jobs |
| **Data Science Exploration** | Variable usage, flexible SLAs | Medium | Low - notebook migration |
| **Application APIs** | Programmatic access, authentication requirements | High | Medium - SDK integration |

### Prioritization Framework

Use this framework to sequence consumer migrations:

| Factor | Weight | Evaluation Criteria |
|--------|--------|---------------------|
| **Business Impact** | 40% | Revenue impact, executive visibility, regulatory requirements |
| **Usage Volume** | 25% | Query frequency, user count, data volume |
| **SLA Strictness** | 20% | Latency requirements, uptime guarantees, business continuity |
| **Migration Risk** | 15% | Technical complexity, dependencies, rollback difficulty |

**Example Scoring:**
- **Critical (80-100 points):** Migrate immediately, dedicated resources, phased rollout
- **High (60-79 points):** Early migration wave, standard validation
- **Medium (40-59 points):** Mid-wave migration, batch processing
- **Low (0-39 points):** Late wave or self-service migration

# BI Tool Connectivity

## Overview

BI tools are typically the highest-volume consumers of your data platform. Databricks SQL provides a standards-based connection layer optimized for BI workloads with enterprise-grade performance, concurrency, and governance.

### SQL Warehouse Architecture for BI

SQL Warehouses are optimized compute clusters designed specifically for BI and SQL analytics workloads.

<br />

<div class="mermaid">
graph TB
    subgraph "BI Tools"
        Tableau[Tableau<br/>Partner Connect]
        PowerBI[Power BI<br/>Partner Connect]
        Looker[Looker<br/>JDBC/ODBC]
        Other[Other BI Tools<br/>JDBC/ODBC]
    end

    subgraph "SQL Warehouse Layer"
        SW1[SQL Warehouse<br/>BI Production]
        SW2[SQL Warehouse<br/>Executive Dashboards]
        SW3[SQL Warehouse<br/>Ad-hoc Analysis]
    end

    subgraph "Unity Catalog"
        UC[Catalog Governance<br/>Row/Column Security]
        Cache[Result Cache<br/>Query Acceleration]
    end

    subgraph "Delta Tables"
        Gold[Gold Tables<br/>Business Ready]
    end

    Tableau --> SW1
    PowerBI --> SW1
    Looker --> SW2
    Other --> SW3

    SW1 --> UC
    SW2 --> UC
    SW3 --> UC

    UC --> Cache
    Cache --> Gold

    style UC fill:#FF3621,stroke:#333,stroke-width:2px,color:#fff
    style SW1 fill:#1b3b6f,stroke:#333,stroke-width:2px,color:#fff
    style SW2 fill:#1b3b6f,stroke:#333,stroke-width:2px,color:#fff
    style SW3 fill:#1b3b6f,stroke:#333,stroke-width:2px,color:#fff
</div>

<script type="module">
  import mermaid from 'https://cdn.jsdelivr.net/npm/mermaid@10/dist/mermaid.esm.min.mjs';
  mermaid.initialize({ startOnLoad: true, theme: 'default' });
</script>

## SQL Warehouse Types and Selection

Choose the right warehouse type based on workload characteristics and budget.

| Warehouse Type | Use Case | Startup Time | Cost Model | Best For |
|----------------|----------|--------------|------------|----------|
| **Serverless** | Production BI, dashboards, APIs | Instant | Pay per query (with idle charges) | Variable workloads, instant availability |
| **Pro** | High-concurrency BI, analyst workloads | ~2-5 min | DBU-based with auto-scaling | Predictable BI loads, cost optimization |
| **Classic** | Legacy compatibility, specific requirements | ~5-10 min | DBU-based | Special configurations, control plane requirements |

### Warehouse Sizing Guidelines

| Warehouse Size | Concurrent Users | Queries per Hour | Typical Use Case |
|----------------|------------------|------------------|------------------|
| **2X-Small** | 1-5 | < 100 | Development, testing |
| **X-Small** | 5-10 | 100-500 | Small teams, departmental BI |
| **Small** | 10-20 | 500-1,000 | Mid-size teams |
| **Medium** | 20-40 | 1,000-5,000 | Enterprise BI, production dashboards |
| **Large** | 40-80 | 5,000-10,000 | High-concurrency, mission-critical |
| **X-Large+** | 80+ | 10,000+ | Extreme scale, global BI platforms |

**Auto-scaling Recommendation:** Enable auto-scaling (min 1, max 3-4x min) to handle variable loads without overprovisioning.

## BI Tool Integration Patterns

### Pattern 1: Partner Connect (Recommended)

**Supported Tools:** Tableau, Power BI, Fivetran, dbt Cloud, others

Partner Connect automates authentication, warehouse provisioning, and credential management.

<div class="code-block" data-language="bash">
# Partner Connect Steps (UI-based):
# 1. Navigate to Databricks workspace → Partner Connect
# 2. Select BI tool (e.g., Tableau, Power BI)
# 3. Click "Connect" - automatic service principal creation
# 4. Download credentials or follow OAuth flow
# 5. Tool-specific configuration wizard completes setup
</div>

**Advantages:**
- One-click setup with minimal configuration
- Automatic service principal and token management
- Tool-optimized warehouse configuration
- Simplified credential rotation

### Pattern 2: JDBC/ODBC Connection (Standard)

**Supported Tools:** All JDBC/ODBC-compatible BI tools

Manual connection using JDBC/ODBC drivers for maximum flexibility.

<div class="code-block" data-language="bash">
# JDBC Connection String
jdbc:databricks://<workspace-url>:443/default;
  transportMode=http;
  ssl=1;
  httpPath=/sql/1.0/warehouses/<warehouse-id>;
  AuthMech=3;
  UID=token;
  PWD=<personal-access-token>
</div>

<div class="code-block" data-language="python">
# Python Example using databricks-sql-connector
from databricks import sql

connection = sql.connect(
    server_hostname="<workspace-url>",
    http_path="/sql/1.0/warehouses/<warehouse-id>",
    access_token="<personal-access-token>"
)

cursor = connection.cursor()
cursor.execute("SELECT * FROM catalog.schema.table LIMIT 10")
result = cursor.fetchall()
cursor.close()
connection.close()
</div>

### Pattern 3: OAuth 2.0 (Enterprise)

**Supported Tools:** Tableau, Power BI, custom applications

OAuth provides user-specific authentication and fine-grained access control.

<div class="code-block" data-language="bash">
# OAuth Configuration Steps:
# 1. Settings → Developer → OAuth Applications
# 2. Create new OAuth app with redirect URIs
# 3. Configure BI tool to use OAuth flow
# 4. Users authenticate via Databricks workspace login
# 5. Access inherits Unity Catalog permissions
</div>

**Advantages:**
- User-specific auditing (no shared credentials)
- Unity Catalog row/column security honored
- Automatic credential refresh
- Compliance-friendly (individual accountability)

<link href="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/themes/prism.min.css" rel="stylesheet" />
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/prism.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/components/prism-bash.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/components/prism-python.min.js"></script>

<script>
(function() {
    document.querySelectorAll('.code-block').forEach(function(block) {
        var lang = block.getAttribute('data-language') || 'bash';
        var code = block.textContent.trim();
        var id = 'code-' + Math.random().toString(36).substr(2, 9);

        block.innerHTML =
            '<div style="position:relative;margin:16px 0;">' +
                '<button class="copy-btn" style="position:absolute;top:8px;right:8px;padding:4px 12px;font-size:12px;background:#ddd;color:#333;border:1px solid #ccc;border-radius:4px;cursor:pointer;z-index:10;">Copy</button>' +
                '<pre style="background:#f8f8f8;border-radius:8px;padding:16px;padding-top:40px;overflow-x:auto;margin:0;border:1px solid #e0e0e0;"><code id="' + id + '" class="language-' + lang + '" style="font-family:Consolas,Monaco,monospace;font-size:14px;"></code></pre>' +
            '</div>';

        var codeEl = document.getElementById(id);
        codeEl.textContent = code;
        Prism.highlightElement(codeEl);

        block.querySelector('.copy-btn').onclick = function() {
            var t = document.createElement('textarea');
            t.value = code;
            document.body.appendChild(t);
            t.select();
            document.execCommand('copy');
            document.body.removeChild(t);
            this.textContent = '✓ Copied!';
            setTimeout(() => this.textContent = 'Copy', 2000);
        };
    });
})();
</script>

## BI Tool-Specific Configuration

### Tableau Integration

**Connection Method:** Partner Connect (recommended) or native Databricks connector

<div class="code-block" data-language="bash">
# Tableau Desktop Configuration
# 1. Connect → More → Databricks
# 2. Enter server hostname (workspace URL)
# 3. HTTP Path: /sql/1.0/warehouses/<warehouse-id>
# 4. Authentication: Personal Access Token or OAuth
# 5. Select catalog, schema, tables
</div>

**Optimization Tips:**
- Enable **Tableau hyper extracts** for large datasets (faster initial load, scheduled refresh)
- Use **Live connections** for real-time dashboards (query pushdown to Photon)
- Configure **connection pooling** for high-concurrency environments
- Leverage **Tableau Catalog** with Unity Catalog lineage for data governance

### Power BI Integration

**Connection Method:** Partner Connect or native connector

<div class="code-block" data-language="bash">
# Power BI Desktop Configuration
# 1. Get Data → More → Databricks
# 2. Server hostname: <workspace-url>
# 3. HTTP Path: /sql/1.0/warehouses/<warehouse-id>
# 4. Data Connectivity: DirectQuery (recommended) or Import
# 5. Authentication: Personal Access Token
</div>

**Optimization Tips:**
- Use **DirectQuery mode** for large datasets (query pushdown to Databricks)
- Configure **incremental refresh** for Import mode to reduce data movement
- Enable **query folding** to ensure predicates push to SQL Warehouse
- Use **Power BI Premium** with dedicated capacity for enterprise deployments

### Looker Integration

**Connection Method:** JDBC with custom dialect

<div class="code-block" data-language="bash">
# Looker Connection Configuration
# Database: Databricks (select from dialect dropdown)
# Host: <workspace-url>
# Port: 443
# Database: <catalog-name>
# Username: token
# Password: <personal-access-token>
# Additional JDBC parameters: httpPath=/sql/1.0/warehouses/<warehouse-id>
</div>

**Optimization Tips:**
- Use **PDTs (Persistent Derived Tables)** backed by Delta tables for aggregations
- Configure **connection pooling** with appropriate max connections
- Leverage **aggregate awareness** to route queries to pre-aggregated tables
- Enable **SQL Runner** for analysts to explore data directly

<link href="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/themes/prism.min.css" rel="stylesheet" />
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/prism.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/components/prism-bash.min.js"></script>

<script>
(function() {
    document.querySelectorAll('.code-block').forEach(function(block) {
        var lang = block.getAttribute('data-language') || 'bash';
        var code = block.textContent.trim();
        var id = 'code-' + Math.random().toString(36).substr(2, 9);

        block.innerHTML =
            '<div style="position:relative;margin:16px 0;">' +
                '<button class="copy-btn" style="position:absolute;top:8px;right:8px;padding:4px 12px;font-size:12px;background:#ddd;color:#333;border:1px solid #ccc;border-radius:4px;cursor:pointer;z-index:10;">Copy</button>' +
                '<pre style="background:#f8f8f8;border-radius:8px;padding:16px;padding-top:40px;overflow-x:auto;margin:0;border:1px solid #e0e0e0;"><code id="' + id + '" class="language-' + lang + '" style="font-family:Consolas,Monaco,monospace;font-size:14px;"></code></pre>' +
            '</div>';

        var codeEl = document.getElementById(id);
        codeEl.textContent = code;
        Prism.highlightElement(codeEl);

        block.querySelector('.copy-btn').onclick = function() {
            var t = document.createElement('textarea');
            t.value = code;
            document.body.appendChild(t);
            t.select();
            document.execCommand('copy');
            document.body.removeChild(t);
            this.textContent = '✓ Copied!';
            setTimeout(() => this.textContent = 'Copy', 2000);
        };
    });
})();
</script>

## BI Performance Optimization

### Query Acceleration Techniques

| Technique | What It Does | When to Use |
|-----------|--------------|-------------|
| **Photon Engine** | Vectorized C++ query engine | Always (included in SQL Warehouses) |
| **Result Caching** | Caches query results for reuse | Dashboards with repeated queries |
| **Liquid Clustering** | Automatic data layout optimization | Large fact tables with predictable query patterns |
| **Materialized Views** | Pre-computed aggregations | Complex aggregations queried frequently |
| **Z-Ordering** | Data skipping via co-location | High cardinality columns in filters |
| **Predictive Optimization** | Automatic maintenance and stats | Always (background optimization) |

### Dashboard-Specific Optimization

<div class="code-block" data-language="sql">
-- Create materialized view for dashboard aggregations
CREATE MATERIALIZED VIEW catalog.schema.sales_summary_mv
AS
SELECT
    date_trunc('day', order_date) AS order_day,
    product_category,
    region,
    COUNT(*) AS order_count,
    SUM(revenue) AS total_revenue,
    AVG(revenue) AS avg_revenue
FROM catalog.schema.orders
GROUP BY date_trunc('day', order_date), product_category, region;

-- Refresh schedule (automatic with serverless)
ALTER MATERIALIZED VIEW catalog.schema.sales_summary_mv
SET TBLPROPERTIES ('pipelines.autoOptimize.managed' = 'true');
</div>

<div class="code-block" data-language="sql">
-- Enable liquid clustering for large tables
ALTER TABLE catalog.schema.orders
CLUSTER BY (order_date, customer_id);

-- Clustering is automatic - no manual maintenance required
</div>

<link href="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/themes/prism.min.css" rel="stylesheet" />
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/prism.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/components/prism-sql.min.js"></script>

<script>
(function() {
    document.querySelectorAll('.code-block').forEach(function(block) {
        var lang = block.getAttribute('data-language') || 'sql';
        var code = block.textContent.trim();
        var id = 'code-' + Math.random().toString(36).substr(2, 9);

        block.innerHTML =
            '<div style="position:relative;margin:16px 0;">' +
                '<button class="copy-btn" style="position:absolute;top:8px;right:8px;padding:4px 12px;font-size:12px;background:#ddd;color:#333;border:1px solid #ccc;border-radius:4px;cursor:pointer;z-index:10;">Copy</button>' +
                '<pre style="background:#f8f8f8;border-radius:8px;padding:16px;padding-top:40px;overflow-x:auto;margin:0;border:1px solid #e0e0e0;"><code id="' + id + '" class="language-' + lang + '" style="font-family:Consolas,Monaco,monospace;font-size:14px;"></code></pre>' +
            '</div>';

        var codeEl = document.getElementById(id);
        codeEl.textContent = code;
        Prism.highlightElement(codeEl);

        block.querySelector('.copy-btn').onclick = function() {
            var t = document.createElement('textarea');
            t.value = code;
            document.body.appendChild(t);
            t.select();
            document.execCommand('copy');
            document.body.removeChild(t);
            this.textContent = '✓ Copied!';
            setTimeout(() => this.textContent = 'Copy', 2000);
        };
    });
})();
</script>

# Dashboard Migration and SLA Validation

## Overview

Dashboard migration is not just about porting visuals - it's about maintaining **business continuity** and **meeting SLA commitments** while modernizing your BI stack.

### Migration Approach Decision Tree

<br />

<div class="mermaid">
graph TD
    Start[Dashboard Migration Decision] --> Complexity{Dashboard Complexity?}

    Complexity -->|Simple<br/>Single table| Direct[Direct Migration<br/>Repoint connection]
    Complexity -->|Moderate<br/>Multiple tables| Rewrite[Rewrite Queries<br/>Optimize for Databricks]
    Complexity -->|Complex<br/>Heavy logic| Refactor[Refactor Architecture<br/>SQL + Python]

    Direct --> Validate1[Validate Results]
    Rewrite --> Validate2[Validate Results]
    Refactor --> Validate3[Validate Results]

    Validate1 --> SLA{Meets SLA?}
    Validate2 --> SLA
    Validate3 --> SLA

    SLA -->|Yes| Deploy[Deploy to Production]
    SLA -->|No| Optimize[Optimize Performance]

    Optimize --> Retest[Retest SLA]
    Retest --> SLA

    Deploy --> Monitor[Monitor & Alert]

    style Start fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    style Deploy fill:#c8e6c9,stroke:#388e3c,stroke-width:2px
    style SLA fill:#fff9c4,stroke:#f57f17,stroke-width:2px
</div>

<script type="module">
  import mermaid from 'https://cdn.jsdelivr.net/npm/mermaid@10/dist/mermaid.esm.min.mjs';
  mermaid.initialize({ startOnLoad: true, theme: 'default' });
</script>

## Dashboard Migration Patterns

| Migration Pattern | Description | Use Case | Complexity |
|-------------------|-------------|----------|------------|
| **Repoint** | Change connection, keep dashboard as-is | Simple dashboards, compatible SQL | Low |
| **Rewrite** | Rebuild dashboard in same tool on Databricks | Moderate complexity, query optimization needed | Medium |
| **Refactor** | Redesign dashboard architecture | Heavy transformations, poor performance | High |
| **Hybrid** | Maintain legacy dashboard while building new | Critical dashboards, gradual transition | Medium |

### Migration Checklist by Dashboard Type

| Dashboard Type | Key Validations | Common Issues | Mitigation |
|----------------|-----------------|---------------|------------|
| **Executive** | Results accuracy, refresh time < SLA | Custom SQL not supported | Rewrite using Databricks SQL dialect |
| **Operational** | Concurrency handling, real-time freshness | High query volume | Auto-scaling warehouse, result caching |
| **Analytical** | Complex aggregations, drill-down paths | Slow aggregations | Materialized views, Photon acceleration |
| **Embedded** | API latency, authentication flow | Token expiration | OAuth with refresh tokens |

## SLA Definition and Validation

### Defining Measurable SLAs

SLAs must be **specific, measurable, and business-aligned** to validate migration success.

| SLA Type | Metric | Example Target | Measurement Method |
|----------|--------|----------------|-------------------|
| **Query Latency** | P50, P95, P99 response time | P95 < 5 seconds | Query history analysis |
| **Dashboard Refresh** | End-to-end refresh time | < 30 seconds | Dashboard load time |
| **Data Freshness** | Time from source to BI | < 15 minutes | Pipeline watermarks |
| **Availability** | Uptime percentage | 99.9% uptime | Warehouse availability metrics |
| **Concurrency** | Simultaneous users supported | 100 concurrent users | Load testing |
| **Accuracy** | Data validation pass rate | 100% match with source | Row count, hash comparison |

### SLA Validation Framework

<div class="code-block" data-language="python">
# SLA Validation Script Example
from databricks import sql
import time
import pandas as pd

def validate_dashboard_sla(warehouse_id, queries, p95_target_seconds=5.0):
    """
    Validate dashboard SLA by running representative queries
    and measuring P95 latency.
    """
    connection = sql.connect(
        server_hostname=dbutils.secrets.get("prod", "db-hostname"),
        http_path=f"/sql/1.0/warehouses/{warehouse_id}",
        access_token=dbutils.secrets.get("prod", "db-token")
    )

    latencies = []

    for query in queries:
        cursor = connection.cursor()
        start_time = time.time()
        cursor.execute(query)
        results = cursor.fetchall()
        end_time = time.time()

        latency = end_time - start_time
        latencies.append({
            'query': query[:50],
            'latency_seconds': latency,
            'row_count': len(results)
        })
        cursor.close()

    connection.close()

    # Calculate percentiles
    df = pd.DataFrame(latencies)
    p50 = df['latency_seconds'].quantile(0.50)
    p95 = df['latency_seconds'].quantile(0.95)
    p99 = df['latency_seconds'].quantile(0.99)

    sla_met = p95 <= p95_target_seconds

    print(f"P50 Latency: {p50:.2f}s")
    print(f"P95 Latency: {p95:.2f}s")
    print(f"P99 Latency: {p99:.2f}s")
    print(f"SLA Met (P95 < {p95_target_seconds}s): {sla_met}")

    return sla_met, df

# Example usage
dashboard_queries = [
    "SELECT * FROM catalog.schema.sales_summary WHERE date >= current_date() - 7",
    "SELECT product, SUM(revenue) FROM catalog.schema.orders GROUP BY product"
]

sla_met, results = validate_dashboard_sla(
    warehouse_id="abc123def456",
    queries=dashboard_queries,
    p95_target_seconds=5.0
)
</div>

<link href="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/themes/prism.min.css" rel="stylesheet" />
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/prism.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/components/prism-python.min.js"></script>

<script>
(function() {
    document.querySelectorAll('.code-block').forEach(function(block) {
        var lang = block.getAttribute('data-language') || 'python';
        var code = block.textContent.trim();
        var id = 'code-' + Math.random().toString(36).substr(2, 9);

        block.innerHTML =
            '<div style="position:relative;margin:16px 0;">' +
                '<button class="copy-btn" style="position:absolute;top:8px;right:8px;padding:4px 12px;font-size:12px;background:#ddd;color:#333;border:1px solid #ccc;border-radius:4px;cursor:pointer;z-index:10;">Copy</button>' +
                '<pre style="background:#f8f8f8;border-radius:8px;padding:16px;padding-top:40px;overflow-x:auto;margin:0;border:1px solid #e0e0e0;"><code id="' + id + '" class="language-' + lang + '" style="font-family:Consolas,Monaco,monospace;font-size:14px;"></code></pre>' +
            '</div>';

        var codeEl = document.getElementById(id);
        codeEl.textContent = code;
        Prism.highlightElement(codeEl);

        block.querySelector('.copy-btn').onclick = function() {
            var t = document.createElement('textarea');
            t.value = code;
            document.body.appendChild(t);
            t.select();
            document.execCommand('copy');
            document.body.removeChild(t);
            this.textContent = '✓ Copied!';
            setTimeout(() => this.textContent = 'Copy', 2000);
        };
    });
})();
</script>

## Dashboard Monitoring and Alerting

### Key Metrics to Monitor

| Metric | What It Measures | Alert Threshold | Action |
|--------|------------------|----------------|--------|
| **Query Failure Rate** | Percentage of failed queries | > 1% | Investigate error logs, check permissions |
| **Query Duration** | P95 query latency | > SLA threshold | Optimize queries, scale warehouse |
| **Warehouse Utilization** | Cluster load percentage | > 80% sustained | Scale up or enable auto-scaling |
| **Queue Time** | Time queries wait in queue | > 10 seconds | Increase warehouse size or concurrency |
| **Data Freshness** | Pipeline delay from source | > freshness SLA | Investigate pipeline delays |
| **Error Codes** | Specific error patterns | > 10 occurrences | Fix query syntax, permissions, or data issues |

### Alerting Configuration Example

<div class="code-block" data-language="python">
# Databricks SQL Alert Configuration
# UI: SQL Warehouses → Query History → Create Alert

# Example: Alert on slow queries
"""
Alert Name: Dashboard SLA Breach - Slow Queries
Query:
  SELECT
    query_id,
    statement_text,
    execution_duration_ms / 1000.0 AS duration_seconds,
    user_name,
    warehouse_id
  FROM system.query.history
  WHERE execution_duration_ms > 5000  -- 5 second threshold
    AND start_time >= current_timestamp() - INTERVAL 1 HOUR
  ORDER BY execution_duration_ms DESC
  LIMIT 10

Schedule: Every 15 minutes
Condition: Rows returned > 0
Notification: Slack #data-platform-alerts, Email: oncall@company.com
"""
</div>

<div class="code-block" data-language="python">
# Programmatic monitoring via SDK
from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

# Get query history for SLA monitoring
queries = w.sql.list_queries(
    warehouse_id="abc123def456",
    max_results=100
)

for query in queries:
    if query.execution_duration_ms > 5000:  # 5 second threshold
        print(f"SLA breach detected: Query {query.query_id}")
        print(f"Duration: {query.execution_duration_ms / 1000.0}s")
        print(f"User: {query.user_name}")
        # Send alert via monitoring system
</div>

<link href="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/themes/prism.min.css" rel="stylesheet" />
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/prism.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/components/prism-python.min.js"></script>

<script>
(function() {
    document.querySelectorAll('.code-block').forEach(function(block) {
        var lang = block.getAttribute('data-language') || 'python';
        var code = block.textContent.trim();
        var id = 'code-' + Math.random().toString(36).substr(2, 9);

        block.innerHTML =
            '<div style="position:relative;margin:16px 0;">' +
                '<button class="copy-btn" style="position:absolute;top:8px;right:8px;padding:4px 12px;font-size:12px;background:#ddd;color:#333;border:1px solid #ccc;border-radius:4px;cursor:pointer;z-index:10;">Copy</button>' +
                '<pre style="background:#f8f8f8;border-radius:8px;padding:16px;padding-top:40px;overflow-x:auto;margin:0;border:1px solid #e0e0e0;"><code id="' + id + '" class="language-' + lang + '" style="font-family:Consolas,Monaco,monospace;font-size:14px;"></code></pre>' +
            '</div>';

        var codeEl = document.getElementById(id);
        codeEl.textContent = code;
        Prism.highlightElement(codeEl);

        block.querySelector('.copy-btn').onclick = function() {
            var t = document.createElement('textarea');
            t.value = code;
            document.body.appendChild(t);
            t.select();
            document.execCommand('copy');
            document.body.removeChild(t);
            this.textContent = '✓ Copied!';
            setTimeout(() => this.textContent = 'Copy', 2000);
        };
    });
})();
</script>

# ML Pipeline Integration with Unity Catalog

## Overview

Machine learning pipelines require more than just compute - they need governed access to features, models, and inference infrastructure. Unity Catalog provides unified governance across the entire ML lifecycle.

### ML Lifecycle with Unity Catalog

<br />

<div class="mermaid">
graph LR
    subgraph "Feature Engineering"
        Raw[Raw Data<br/>Bronze/Silver] --> FE[Feature Engineering<br/>Delta Live Tables]
        FE --> FS[Feature Store<br/>Unity Catalog]
    end

    subgraph "Model Training"
        FS --> Train[Model Training<br/>MLflow + UC]
        Train --> MR[Model Registry<br/>Unity Catalog]
    end

    subgraph "Model Serving"
        MR --> MS[Model Serving<br/>Real-time API]
        MR --> Batch[Batch Inference<br/>Scheduled Jobs]
    end

    subgraph "Governance"
        UC[Unity Catalog<br/>Permissions & Lineage]
    end

    FS -.->|Governed by| UC
    MR -.->|Governed by| UC
    MS -.->|Governed by| UC

    style UC fill:#FF3621,stroke:#333,stroke-width:2px,color:#fff
    style FS fill:#1b3b6f,stroke:#333,stroke-width:2px,color:#fff
    style MR fill:#1b3b6f,stroke:#333,stroke-width:2px,color:#fff
    style MS fill:#1b3b6f,stroke:#333,stroke-width:2px,color:#fff
</div>

<script type="module">
  import mermaid from 'https://cdn.jsdelivr.net/npm/mermaid@10/dist/mermaid.esm.min.mjs';
  mermaid.initialize({ startOnLoad: true, theme: 'default' });
</script>

## Feature Store Integration

Unity Catalog natively supports feature tables with lineage, versioning, and access control.

### Feature Table Creation

<div class="code-block" data-language="python">
from databricks.feature_engineering import FeatureEngineeringClient

fe = FeatureEngineeringClient()

# Create feature table in Unity Catalog
fe.create_table(
    name="catalog.schema.customer_features",
    primary_keys=["customer_id"],
    df=customer_features_df,
    description="Customer demographic and behavioral features for churn prediction",
    tags={"team": "ml-platform", "model": "churn-v2"}
)
</div>

<div class="code-block" data-language="python">
# Grant permissions to ML team
spark.sql("""
    GRANT SELECT ON TABLE catalog.schema.customer_features
    TO `ml-engineers@company.com`
""")

# Read features for training (with automatic lineage tracking)
features = fe.read_table(name="catalog.schema.customer_features")
</div>

### Feature Serving for Inference

<div class="code-block" data-language="python">
# Create training set with feature lookups
from databricks.feature_engineering import FeatureLookup

training_set = fe.create_training_set(
    df=labels_df,  # DataFrame with customer_id and label
    feature_lookups=[
        FeatureLookup(
            table_name="catalog.schema.customer_features",
            lookup_key="customer_id"
        )
    ],
    label="churn_label",
    exclude_columns=["customer_id"]
)

# Train model (features automatically included)
training_df = training_set.load_df()
</div>

<link href="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/themes/prism.min.css" rel="stylesheet" />
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/prism.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/components/prism-python.min.js"></script>

<script>
(function() {
    document.querySelectorAll('.code-block').forEach(function(block) {
        var lang = block.getAttribute('data-language') || 'python';
        var code = block.textContent.trim();
        var id = 'code-' + Math.random().toString(36).substr(2, 9);

        block.innerHTML =
            '<div style="position:relative;margin:16px 0;">' +
                '<button class="copy-btn" style="position:absolute;top:8px;right:8px;padding:4px 12px;font-size:12px;background:#ddd;color:#333;border:1px solid #ccc;border-radius:4px;cursor:pointer;z-index:10;">Copy</button>' +
                '<pre style="background:#f8f8f8;border-radius:8px;padding:16px;padding-top:40px;overflow-x:auto;margin:0;border:1px solid #e0e0e0;"><code id="' + id + '" class="language-' + lang + '" style="font-family:Consolas,Monaco,monospace;font-size:14px;"></code></pre>' +
            '</div>';

        var codeEl = document.getElementById(id);
        codeEl.textContent = code;
        Prism.highlightElement(codeEl);

        block.querySelector('.copy-btn').onclick = function() {
            var t = document.createElement('textarea');
            t.value = code;
            document.body.appendChild(t);
            t.select();
            document.execCommand('copy');
            document.body.removeChild(t);
            this.textContent = '✓ Copied!';
            setTimeout(() => this.textContent = 'Copy', 2000);
        };
    });
})();
</script>

## Model Registry Integration

Unity Catalog serves as the central model registry with governance, lineage, and deployment workflows.

### Model Registration

<div class="code-block" data-language="python">
import mlflow
from mlflow import MlflowClient

mlflow.set_registry_uri("databricks-uc")

# Train and log model
with mlflow.start_run() as run:
    # Training code
    model = train_model(training_df)

    # Log model to Unity Catalog
    mlflow.sklearn.log_model(
        sk_model=model,
        artifact_path="model",
        registered_model_name="catalog.schema.churn_classifier"
    )
</div>

### Model Versioning and Promotion

<div class="code-block" data-language="python">
client = MlflowClient()

# Get latest model version
latest_version = client.get_latest_versions(
    name="catalog.schema.churn_classifier",
    stages=["None"]
)[0]

# Promote to production with alias
client.set_registered_model_alias(
    name="catalog.schema.churn_classifier",
    alias="champion",
    version=latest_version.version
)

# Grant serving permissions
spark.sql("""
    GRANT EXECUTE ON MODEL catalog.schema.churn_classifier
    TO `application-service-principal`
""")
</div>

<link href="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/themes/prism.min.css" rel="stylesheet" />
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/prism.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/components/prism-python.min.js"></script>

<script>
(function() {
    document.querySelectorAll('.code-block').forEach(function(block) {
        var lang = block.getAttribute('data-language') || 'python';
        var code = block.textContent.trim();
        var id = 'code-' + Math.random().toString(36).substr(2, 9);

        block.innerHTML =
            '<div style="position:relative;margin:16px 0;">' +
                '<button class="copy-btn" style="position:absolute;top:8px;right:8px;padding:4px 12px;font-size:12px;background:#ddd;color:#333;border:1px solid #ccc;border-radius:4px;cursor:pointer;z-index:10;">Copy</button>' +
                '<pre style="background:#f8f8f8;border-radius:8px;padding:16px;padding-top:40px;overflow-x:auto;margin:0;border:1px solid #e0e0e0;"><code id="' + id + '" class="language-' + lang + '" style="font-family:Consolas,Monaco,monospace;font-size:14px;"></code></pre>' +
            '</div>';

        var codeEl = document.getElementById(id);
        codeEl.textContent = code;
        Prism.highlightElement(codeEl);

        block.querySelector('.copy-btn').onclick = function() {
            var t = document.createElement('textarea');
            t.value = code;
            document.body.appendChild(t);
            t.select();
            document.execCommand('copy');
            document.body.removeChild(t);
            this.textContent = '✓ Copied!';
            setTimeout(() => this.textContent = 'Copy', 2000);
        };
    });
})();
</script>

## Model Serving Patterns

| Serving Pattern | Use Case | Latency | Throughput | Cost Model |
|-----------------|----------|---------|------------|------------|
| **Real-time (Serverless)** | Low-latency APIs (< 100ms), variable load | < 100ms | High concurrency | Pay per request |
| **Real-time (Provisioned)** | Predictable load, cost optimization | < 200ms | Medium-high | DBU-based |
| **Batch Inference** | Large datasets, scheduled scoring | Minutes | Very high | Job compute |
| **Streaming Inference** | Event-driven predictions | Seconds | High | Streaming cluster |

### Real-time Model Serving

<div class="code-block" data-language="python">
# Create Model Serving Endpoint (UI or API)
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.serving import EndpointCoreConfigInput, ServedEntityInput

w = WorkspaceClient()

w.serving_endpoints.create(
    name="churn-classifier-endpoint",
    config=EndpointCoreConfigInput(
        served_entities=[
            ServedEntityInput(
                entity_name="catalog.schema.churn_classifier",
                entity_version="1",
                workload_size="Small",
                scale_to_zero_enabled=True
            )
        ]
    )
)
</div>

### Calling the Inference API

<div class="code-block" data-language="python">
import requests
import json

# Get endpoint URL and token
endpoint_url = "https://<workspace-url>/serving-endpoints/churn-classifier-endpoint/invocations"
token = dbutils.secrets.get("prod", "db-token")

# Prepare inference payload
payload = {
    "dataframe_records": [
        {"customer_id": "C12345", "age": 35, "tenure_months": 24},
        {"customer_id": "C67890", "age": 42, "tenure_months": 6}
    ]
}

# Call inference API
response = requests.post(
    endpoint_url,
    headers={"Authorization": f"Bearer {token}"},
    json=payload
)

predictions = response.json()
print(predictions)
</div>

<link href="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/themes/prism.min.css" rel="stylesheet" />
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/prism.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/components/prism-python.min.js"></script>

<script>
(function() {
    document.querySelectorAll('.code-block').forEach(function(block) {
        var lang = block.getAttribute('data-language') || 'python';
        var code = block.textContent.trim();
        var id = 'code-' + Math.random().toString(36).substr(2, 9);

        block.innerHTML =
            '<div style="position:relative;margin:16px 0;">' +
                '<button class="copy-btn" style="position:absolute;top:8px;right:8px;padding:4px 12px;font-size:12px;background:#ddd;color:#333;border:1px solid #ccc;border-radius:4px;cursor:pointer;z-index:10;">Copy</button>' +
                '<pre style="background:#f8f8f8;border-radius:8px;padding:16px;padding-top:40px;overflow-x:auto;margin:0;border:1px solid #e0e0e0;"><code id="' + id + '" class="language-' + lang + '" style="font-family:Consolas,Monaco,monospace;font-size:14px;"></code></pre>' +
            '</div>';

        var codeEl = document.getElementById(id);
        codeEl.textContent = code;
        Prism.highlightElement(codeEl);

        block.querySelector('.copy-btn').onclick = function() {
            var t = document.createElement('textarea');
            t.value = code;
            document.body.appendChild(t);
            t.select();
            document.execCommand('copy');
            document.body.removeChild(t);
            this.textContent = '✓ Copied!';
            setTimeout(() => this.textContent = 'Copy', 2000);
        };
    });
})();
</script>

# API and SDK Patterns for Downstream Consumers

## Overview

Applications, services, and custom tools need programmatic access to Databricks. Choose the right API pattern based on access requirements and security posture.

### API Access Patterns

<br />

<div class="mermaid">
graph TB
    subgraph "Consumer Applications"
        WebApp[Web Application]
        Service[Microservice]
        Tool[Custom Tool]
        Script[Automation Script]
    end

    subgraph "Authentication Layer"
        OAuth[OAuth 2.0<br/>User Context]
        PAT[Personal Access Token<br/>Dev/Testing]
        SP[Service Principal<br/>Production Apps]
    end

    subgraph "Databricks APIs"
        SQLAPI[SQL Statement API<br/>Query Execution]
        RESTAPI[REST API<br/>Management]
        SDK[Python/Go SDK<br/>Type-safe Client]
        JDBC[JDBC/ODBC<br/>Standards-based]
    end

    subgraph "Data & Compute"
        Tables[Unity Catalog Tables]
        Warehouses[SQL Warehouses]
        Jobs[Jobs & Workflows]
    end

    WebApp --> OAuth
    Service --> SP
    Tool --> PAT
    Script --> SP

    OAuth --> SDK
    PAT --> RESTAPI
    SP --> SDK
    SP --> SQLAPI

    SDK --> Tables
    SQLAPI --> Warehouses
    RESTAPI --> Jobs

    style SP fill:#c8e6c9,stroke:#388e3c,stroke-width:2px
    style OAuth fill:#fff9c4,stroke:#f57f17,stroke-width:2px
</div>

<script type="module">
  import mermaid from 'https://cdn.jsdelivr.net/npm/mermaid@10/dist/mermaid.esm.min.mjs';
  mermaid.initialize({ startOnLoad: true, theme: 'default' });
</script>

## API Authentication Methods

| Method | Use Case | Security | Token Lifetime | Best For |
|--------|----------|----------|----------------|----------|
| **Service Principal** | Production apps, CI/CD, automation | High | Indefinite (rotatable) | Production workloads |
| **OAuth 2.0** | User-facing apps, delegated access | Highest | 1 hour (auto-refresh) | Multi-tenant apps |
| **Personal Access Token** | Development, testing, notebooks | Medium | Configurable (90 days default) | Dev/test only |

<div style="border-left: 4px solid #f57c00; background: #fff3e0; padding: 16px 20px; border-radius: 4px; margin: 16px 0;">
    <div style="display: flex; align-items: flex-start; gap: 12px;">
        <span style="font-size: 24px;">⚠️</span>
        <div>
            <strong style="color: #e65100; font-size: 1.1em;">Never use Personal Access Tokens in production</strong>
            <p style="margin: 8px 0 0 0; color: #333;">
                PATs are tied to individual users and don't support auditing or centralized rotation. Always use <strong>Service Principals</strong> for production applications and automation.
            </p>
        </div>
    </div>
</div>

## SQL Statement API (Recommended)

The SQL Statement API is the recommended method for executing queries programmatically with robust error handling, async execution, and result pagination.

### Python SDK Example

<div class="code-block" data-language="python">
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.sql import StatementState

# Initialize client with service principal
w = WorkspaceClient(
    host="https://<workspace-url>",
    client_id=dbutils.secrets.get("prod", "sp-client-id"),
    client_secret=dbutils.secrets.get("prod", "sp-client-secret")
)

# Execute query asynchronously
statement = w.statement_execution.execute_statement(
    warehouse_id="abc123def456",
    statement="SELECT * FROM catalog.schema.customers WHERE region = 'US'",
    wait_timeout="30s"
)

# Poll for completion
while statement.status.state in [StatementState.PENDING, StatementState.RUNNING]:
    statement = w.statement_execution.get_statement(statement.statement_id)
    time.sleep(1)

# Get results
if statement.status.state == StatementState.SUCCEEDED:
    results = statement.result
    for row in results.data_array:
        print(row)
else:
    print(f"Query failed: {statement.status.error}")
</div>

### REST API Example

<div class="code-block" data-language="bash">
# Execute SQL via REST API
curl -X POST https://<workspace-url>/api/2.0/sql/statements/ \
  -H "Authorization: Bearer $DATABRICKS_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "warehouse_id": "abc123def456",
    "statement": "SELECT COUNT(*) FROM catalog.schema.orders",
    "wait_timeout": "30s"
  }'
</div>

<link href="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/themes/prism.min.css" rel="stylesheet" />
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/prism.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/components/prism-python.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/components/prism-bash.min.js"></script>

<script>
(function() {
    document.querySelectorAll('.code-block').forEach(function(block) {
        var lang = block.getAttribute('data-language') || 'python';
        var code = block.textContent.trim();
        var id = 'code-' + Math.random().toString(36).substr(2, 9);

        block.innerHTML =
            '<div style="position:relative;margin:16px 0;">' +
                '<button class="copy-btn" style="position:absolute;top:8px;right:8px;padding:4px 12px;font-size:12px;background:#ddd;color:#333;border:1px solid #ccc;border-radius:4px;cursor:pointer;z-index:10;">Copy</button>' +
                '<pre style="background:#f8f8f8;border-radius:8px;padding:16px;padding-top:40px;overflow-x:auto;margin:0;border:1px solid #e0e0e0;"><code id="' + id + '" class="language-' + lang + '" style="font-family:Consolas,Monaco,monospace;font-size:14px;"></code></pre>' +
            '</div>';

        var codeEl = document.getElementById(id);
        codeEl.textContent = code;
        Prism.highlightElement(codeEl);

        block.querySelector('.copy-btn').onclick = function() {
            var t = document.createElement('textarea');
            t.value = code;
            document.body.appendChild(t);
            t.select();
            document.execCommand('copy');
            document.body.removeChild(t);
            this.textContent = '✓ Copied!';
            setTimeout(() => this.textContent = 'Copy', 2000);
        };
    });
})();
</script>

## JDBC/ODBC Connectivity

Standards-based connectivity for legacy applications and tools that don't support native Databricks SDKs.

### Java JDBC Example

<div class="code-block" data-language="java">
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;

public class DatabricksJDBCExample {
    public static void main(String[] args) throws Exception {
        String jdbcUrl = "jdbc:databricks://<workspace-url>:443/default;" +
                         "transportMode=http;" +
                         "ssl=1;" +
                         "httpPath=/sql/1.0/warehouses/<warehouse-id>;" +
                         "AuthMech=3;" +
                         "UID=token;" +
                         "PWD=<service-principal-token>";

        Connection conn = DriverManager.getConnection(jdbcUrl);
        Statement stmt = conn.createStatement();

        ResultSet rs = stmt.executeQuery(
            "SELECT * FROM catalog.schema.products LIMIT 10"
        );

        while (rs.next()) {
            System.out.println(rs.getString("product_name"));
        }

        rs.close();
        stmt.close();
        conn.close();
    }
}
</div>

### Python ODBC Example

<div class="code-block" data-language="python">
import pyodbc

connection_string = (
    "Driver={Simba Spark ODBC Driver};"
    "Host=<workspace-url>;"
    "Port=443;"
    "HTTPPath=/sql/1.0/warehouses/<warehouse-id>;"
    "SSL=1;"
    "ThriftTransport=2;"
    "AuthMech=3;"
    "UID=token;"
    "PWD=<service-principal-token>"
)

conn = pyodbc.connect(connection_string, autocommit=True)
cursor = conn.cursor()

cursor.execute("SELECT * FROM catalog.schema.products LIMIT 10")
rows = cursor.fetchall()

for row in rows:
    print(row)

cursor.close()
conn.close()
</div>

<link href="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/themes/prism.min.css" rel="stylesheet" />
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/prism.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/components/prism-java.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/components/prism-python.min.js"></script>

<script>
(function() {
    document.querySelectorAll('.code-block').forEach(function(block) {
        var lang = block.getAttribute('data-language') || 'python';
        var code = block.textContent.trim();
        var id = 'code-' + Math.random().toString(36).substr(2, 9);

        block.innerHTML =
            '<div style="position:relative;margin:16px 0;">' +
                '<button class="copy-btn" style="position:absolute;top:8px;right:8px;padding:4px 12px;font-size:12px;background:#ddd;color:#333;border:1px solid #ccc;border-radius:4px;cursor:pointer;z-index:10;">Copy</button>' +
                '<pre style="background:#f8f8f8;border-radius:8px;padding:16px;padding-top:40px;overflow-x:auto;margin:0;border:1px solid #e0e0e0;"><code id="' + id + '" class="language-' + lang + '" style="font-family:Consolas,Monaco,monospace;font-size:14px;"></code></pre>' +
            '</div>';

        var codeEl = document.getElementById(id);
        codeEl.textContent = code;
        Prism.highlightElement(codeEl);

        block.querySelector('.copy-btn').onclick = function() {
            var t = document.createElement('textarea');
            t.value = code;
            document.body.appendChild(t);
            t.select();
            document.execCommand('copy');
            document.body.removeChild(t);
            this.textContent = '✓ Copied!';
            setTimeout(() => this.textContent = 'Copy', 2000);
        };
    });
})();
</script>

## Rate Limiting and Best Practices

### API Rate Limits

| API Type | Rate Limit | Burst Limit | Throttle Behavior |
|----------|-----------|-------------|-------------------|
| **SQL Statement API** | 100 concurrent statements | 300 requests/min | Queue requests |
| **REST API** | 30 requests/sec | 120 requests/sec | HTTP 429 (retry after) |
| **JDBC/ODBC** | Warehouse dependent | Auto-scaling | Queue at warehouse |

### Best Practices for Consumer Integration

| Practice | Why It Matters | Implementation |
|----------|----------------|----------------|
| **Use Service Principals** | Avoid user dependency, support rotation | Create SP per application |
| **Implement Retry Logic** | Handle transient failures gracefully | Exponential backoff with jitter |
| **Cache Results** | Reduce query volume, improve latency | Application-level caching (Redis, etc.) |
| **Query Optimization** | Minimize compute costs | Use LIMIT, filter early, avoid SELECT * |
| **Connection Pooling** | Reduce connection overhead | Configure pool size based on concurrency |
| **Monitoring & Alerting** | Detect issues before users complain | Track latency, errors, throughput |

### Example: Retry Logic with Exponential Backoff

<div class="code-block" data-language="python">
import time
import random
from databricks.sdk import WorkspaceClient
from databricks.sdk.errors import ResourceConflict

def execute_with_retry(w, warehouse_id, statement, max_retries=3):
    """
    Execute SQL with exponential backoff retry logic.
    """
    for attempt in range(max_retries):
        try:
            result = w.statement_execution.execute_statement(
                warehouse_id=warehouse_id,
                statement=statement,
                wait_timeout="30s"
            )
            return result
        except ResourceConflict as e:
            if attempt == max_retries - 1:
                raise
            # Exponential backoff with jitter
            wait_time = (2 ** attempt) + random.uniform(0, 1)
            print(f"Rate limited, retrying in {wait_time:.2f}s...")
            time.sleep(wait_time)
        except Exception as e:
            print(f"Unexpected error: {e}")
            raise

# Usage
w = WorkspaceClient()
result = execute_with_retry(
    w=w,
    warehouse_id="abc123def456",
    statement="SELECT * FROM catalog.schema.large_table"
)
</div>

<link href="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/themes/prism.min.css" rel="stylesheet" />
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/prism.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/components/prism-python.min.js"></script>

<script>
(function() {
    document.querySelectorAll('.code-block').forEach(function(block) {
        var lang = block.getAttribute('data-language') || 'python';
        var code = block.textContent.trim();
        var id = 'code-' + Math.random().toString(36).substr(2, 9);

        block.innerHTML =
            '<div style="position:relative;margin:16px 0;">' +
                '<button class="copy-btn" style="position:absolute;top:8px;right:8px;padding:4px 12px;font-size:12px;background:#ddd;color:#333;border:1px solid #ccc;border-radius:4px;cursor:pointer;z-index:10;">Copy</button>' +
                '<pre style="background:#f8f8f8;border-radius:8px;padding:16px;padding-top:40px;overflow-x:auto;margin:0;border:1px solid #e0e0e0;"><code id="' + id + '" class="language-' + lang + '" style="font-family:Consolas,Monaco,monospace;font-size:14px;"></code></pre>' +
            '</div>';

        var codeEl = document.getElementById(id);
        codeEl.textContent = code;
        Prism.highlightElement(codeEl);

        block.querySelector('.copy-btn').onclick = function() {
            var t = document.createElement('textarea');
            t.value = code;
            document.body.appendChild(t);
            t.select();
            document.execCommand('copy');
            document.body.removeChild(t);
            this.textContent = '✓ Copied!';
            setTimeout(() => this.textContent = 'Copy', 2000);
        };
    });
})();
</script>

## Summary and Next Steps

Consumer integration is the culmination of your migration - enabling users to extract value from the platform. Success requires:

1. **BI Connectivity** - Standards-based integrations with validated performance
2. **SLA Management** - Measurable commitments with monitoring and alerting
3. **ML Integration** - Governed feature and model pipelines
4. **API Access** - Secure, scalable programmatic access patterns

**Key Takeaways:**
- Segment consumers by business impact and prioritize accordingly
- Use Partner Connect for BI tools when available
- Validate SLAs before cutover, monitor continuously after
- Always use Service Principals for production workloads
- Implement retry logic and error handling for all API consumers

**Next Steps:**

Continue to **[5.3 - Developer Enablement and Adoption]($./5.3 - Developer Enablement and Adoption)** to learn how to scale adoption across your organization.

<div style="color: #FF3621; font-weight: bold; font-size: 2em; margin-bottom: 12px;">COURSE DEVELOPER (remove before publishing)</div>

### Template Customization

**Placeholders to replace:**
- `{SOURCE_PLATFORM}` - Source platform name (Snowflake, BigQuery, Redshift, etc.)
- `<workspace-url>` - Databricks workspace URL placeholder
- `<warehouse-id>` - SQL Warehouse ID placeholder
- `<service-principal-token>` - Token placeholder for examples

**Platform-Specific Content to Add:**

| Section | Customization Needed |
|---------|---------------------|
| **BI Tool Migration** | Document {SOURCE_PLATFORM}-specific dashboard features that require refactoring |
| **SLA Comparison** | Provide typical {SOURCE_PLATFORM} vs Databricks performance benchmarks |
| **ML Integration** | Compare {SOURCE_PLATFORM} ML capabilities with Databricks Feature Store/Model Registry |
| **API Patterns** | Document common {SOURCE_PLATFORM} API patterns and Databricks equivalents |

**Additional Resources to Include:**
- Partner Connect documentation links for specific BI tools
- SQL Warehouse sizing calculator or guidelines
- Model Serving pricing calculator
- SDK documentation references (Python, Go, Java)
- Sample consumer integration repositories (GitHub)

**Testing Recommendations:**
- Validate all code examples in a real workspace
- Test BI tool connections with actual Partner Connect flow
- Verify model serving examples with deployed endpoint
- Ensure SDK examples work with current versions

&copy; 2026 Databricks, Inc. All rights reserved. Apache, Apache Spark, Spark, the Spark Logo, Apache Iceberg, Iceberg, and the Apache Iceberg logo are trademarks of the <a href="https://www.apache.org/" target="_blank">Apache Software Foundation</a>.<br/><br/><a href="https://databricks.com/privacy-policy" target="_blank">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use" target="_blank">Terms of Use</a> | <a href="https://help.databricks.com/" target="_blank">Support</a>