<div style="display: flex; justify-content: space-between; align-items: center; padding: 8px 16px; background: #F8F9FA; border-bottom: 2px solid #E0E0E0; margin: 0; line-height: 1;">
    <div style="font-size: 14px; color: #666;">
        <span style="font-weight: bold; color: #333;">{SOURCE_PLATFORM} → Databricks Migration</span>
        <span style="margin-left: 8px; color: #999;">|</span>
        <span style="margin-left: 8px;">04 - Activate</span>
    </div>
    <div style="display: flex; align-items: center; gap: 8px;">
        <img src="https://cdn.simpleicons.org/snowflake/29B5E8" width="24" height="24"/>
        <span style="color: #999; font-size: 16px;">→</span>
        <img src="https://cdn.simpleicons.org/databricks/FF3621" width="24" height="24"/>
    </div>
</div>


<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img
    src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png"
    alt="Databricks Learning"
  >
</div>

# Cutover Execution

## Overview

Cutover is the critical moment when you transition production workloads from {SOURCE_PLATFORM} to Databricks. This lesson covers cutover strategies—including blue-green, canary, and phased approaches—along with rollback planning, stakeholder communication, and post-cutover validation.

## Learning Objectives

By the end of this lesson, you will be able to:
- Select the appropriate cutover strategy for your migration
- Plan and execute blue-green and canary deployments
- Implement rollback procedures for failed cutovers
- Coordinate stakeholder communication during cutover
- Validate production readiness post-cutover

## Cutover Strategies

Choose a cutover strategy based on your risk tolerance, downtime requirements, and complexity. Each strategy offers different trade-offs between safety and simplicity.

<br />
<div class="mermaid">
flowchart TB
    subgraph STRATEGIES["<b>Cutover Strategies</b>"]
        BG["<b>Blue-Green</b><br/><i>Parallel environments,<br/>instant switch</i>"]
        CAN["<b>Canary</b><br/><i>Gradual traffic shift,<br/>progressive rollout</i>"]
        PHASE["<b>Phased</b><br/><i>Workload-by-workload<br/>migration</i>"]
        BIG["<b>Big Bang</b><br/><i>All-at-once switch,<br/>planned downtime</i>"]
    end
    BG --> |"Lowest Risk"| SAFE["Instant Rollback"]
    CAN --> |"Medium Risk"| MONITOR["Monitor & Adjust"]
    PHASE --> |"Low Risk"| ISOLATE["Isolated Failures"]
    BIG --> |"Highest Risk"| SIMPLE["Simplest Execution"]
    style BG fill:#e8f5e9,stroke:#4caf50
    style CAN fill:#e3f2fd,stroke:#1976d2
    style PHASE fill:#fff3e0,stroke:#ff9800
    style BIG fill:#ffebee,stroke:#f44336
</div>
<script type="module"> import mermaid from "https://cdn.jsdelivr.net/npm/mermaid@10/dist/mermaid.esm.min.mjs"; mermaid.initialize({ startOnLoad: true, theme: "default" }); </script>

### Strategy Comparison

| Strategy | Downtime | Risk Level | Rollback Speed | Best For |
|----------|----------|------------|----------------|----------|
| **Blue-Green** | Zero | Low | Instant | Mission-critical workloads |
| **Canary** | Zero | Low-Medium | Fast | High-traffic consumer apps |
| **Phased** | Minimal | Low | Per-workload | Complex, heterogeneous migrations |
| **Big Bang** | Planned | High | Slow | Simple migrations, acceptable downtime |

## Blue-Green Deployment

Blue-green deployment maintains two parallel production environments. Traffic is instantly switched from "blue" ({SOURCE_PLATFORM}) to "green" (Databricks) with the ability to switch back immediately if issues arise.

<br />
<div class="mermaid">
flowchart LR
    subgraph BEFORE["<b>Before Cutover</b>"]
        LB1["Load Balancer"] --> BLUE1["BLUE<br/>{SOURCE_PLATFORM}<br/>✓ Active"]
        LB1 -.-> GREEN1["GREEN<br/>Databricks<br/>Standby"]
    end
    subgraph AFTER["<b>After Cutover</b>"]
        LB2["Load Balancer"] -.-> BLUE2["BLUE<br/>{SOURCE_PLATFORM}<br/>Standby"]
        LB2 --> GREEN2["GREEN<br/>Databricks<br/>✓ Active"]
    end
    BEFORE --> |"DNS/Route Switch"| AFTER
    style BLUE1 fill:#e3f2fd,stroke:#1976d2
    style GREEN1 fill:#e8f5e9,stroke:#4caf50,stroke-dasharray: 5 5
    style BLUE2 fill:#e3f2fd,stroke:#1976d2,stroke-dasharray: 5 5
    style GREEN2 fill:#e8f5e9,stroke:#4caf50
</div>
<script type="module"> import mermaid from "https://cdn.jsdelivr.net/npm/mermaid@10/dist/mermaid.esm.min.mjs"; mermaid.initialize({ startOnLoad: true, theme: "default" }); </script>

### Blue-Green Implementation Checklist

| Phase | Actions |
|-------|--------|
| **Preparation** | Deploy Databricks environment, sync data, validate parity |
| **Pre-Cutover** | Final delta sync, freeze source writes, run validation |
| **Cutover** | Update DNS/connection strings, switch traffic |
| **Validation** | Smoke tests, monitor dashboards, verify SLAs |
| **Rollback Ready** | Keep {SOURCE_PLATFORM} warm for instant fallback |

<div style="border-left: 4px solid #4caf50; background: #e8f5e9; padding: 16px 20px; border-radius: 4px; margin: 16px 0;">
    <div style="display: flex; align-items: flex-start; gap: 12px;">
        <span style="font-size: 24px;">✓</span>
        <div>
            <strong style="color: #2e7d32; font-size: 1.1em;">Recommended for Mission-Critical Workloads</strong>
            <p style="margin: 8px 0 0 0; color: #333;">
                Blue-green is the safest cutover strategy for production workloads. The ability to instantly roll back by switching traffic makes it ideal for migrations where downtime is unacceptable.
            </p>
        </div>
    </div>
</div>

## Canary Deployment

Canary deployment gradually shifts traffic from {SOURCE_PLATFORM} to Databricks, allowing you to monitor behavior at each stage before committing to a full cutover.

<br />
<div class="mermaid">
flowchart LR
    subgraph STAGE1["<b>Stage 1: 10%</b>"]
        S1["Traffic"] --> S1A["90% → {SOURCE_PLATFORM}"]
        S1 --> S1B["10% → Databricks"]
    end
    subgraph STAGE2["<b>Stage 2: 50%</b>"]
        S2["Traffic"] --> S2A["50% → {SOURCE_PLATFORM}"]
        S2 --> S2B["50% → Databricks"]
    end
    subgraph STAGE3["<b>Stage 3: 100%</b>"]
        S3["Traffic"] --> S3B["100% → Databricks"]
    end
    STAGE1 --> |"Monitor"| STAGE2 --> |"Monitor"| STAGE3
    style S1B fill:#e8f5e9,stroke:#4caf50
    style S2B fill:#e8f5e9,stroke:#4caf50
    style S3B fill:#e8f5e9,stroke:#4caf50
</div>
<script type="module"> import mermaid from "https://cdn.jsdelivr.net/npm/mermaid@10/dist/mermaid.esm.min.mjs"; mermaid.initialize({ startOnLoad: true, theme: "default" }); </script>

### Canary Stages

| Stage | Traffic to Databricks | Duration | Gate Criteria |
|-------|----------------------|----------|---------------|
| **Canary** | 5-10% | 1-2 hours | No errors, latency within SLA |
| **Early Adopters** | 25% | 4-8 hours | Error rate < 0.1%, positive feedback |
| **Majority** | 50% | 24 hours | All metrics stable |
| **Full** | 100% | Permanent | Business sign-off |

### Canary Implementation

<div class="code-block" data-language="python">
# Example: Canary routing logic for BI tool connections
import random

def get_connection_string(user_id: str, canary_percentage: int = 10) -> str:
    """
    Route users to Databricks or {SOURCE_PLATFORM} based on canary percentage.
    Uses consistent hashing so users always get the same destination.
    """
    user_hash = hash(user_id) % 100
    
    if user_hash < canary_percentage:
        return "databricks://my-workspace.cloud.databricks.com/sql/warehouses/abc123"
    else:
        return "{SOURCE_PLATFORM}://account.region.{SOURCE_PLATFORM}computing.com/warehouse"

# Gradually increase canary percentage
canary_stages = [
    {"percentage": 10, "duration_hours": 2},
    {"percentage": 25, "duration_hours": 8},
    {"percentage": 50, "duration_hours": 24},
    {"percentage": 100, "duration_hours": None}  # Final
]
</div>

<link href="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/themes/prism.min.css" rel="stylesheet" />
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/prism.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/components/prism-python.min.js"></script>

<script>
(function() {
    document.querySelectorAll('.code-block').forEach(function(block) {
        var lang = block.getAttribute('data-language') || 'python';
        var code = block.textContent.trim();
        var id = 'code-' + Math.random().toString(36).substr(2, 9);
        
        block.innerHTML = 
            '<div style="position:relative;margin:16px 0;">' +
                '<button class="copy-btn" style="position:absolute;top:8px;right:8px;padding:4px 12px;font-size:12px;background:#ddd;color:#333;border:1px solid #ccc;border-radius:4px;cursor:pointer;z-index:10;">Copy</button>' +
                '<pre style="background:#f8f8f8;border-radius:8px;padding:16px;padding-top:40px;overflow-x:auto;margin:0;border:1px solid #e0e0e0;"><code id="' + id + '" class="language-' + lang + '" style="font-family:Consolas,Monaco,monospace;font-size:14px;"></code></pre>' +
            '</div>';
        
        var codeEl = document.getElementById(id);
        codeEl.textContent = code;
        Prism.highlightElement(codeEl);
        
        block.querySelector('.copy-btn').onclick = function() {
            var t = document.createElement('textarea');
            t.value = code;
            document.body.appendChild(t);
            t.select();
            document.execCommand('copy');
            document.body.removeChild(t);
            this.textContent = '✓ Copied!';
            setTimeout(() => this.textContent = 'Copy', 2000);
        };
    });
})();
</script>

## Phased Migration

Phased migration moves workloads incrementally, typically by business domain or priority. This allows isolated failures and learning from early phases.

<br />
<div class="mermaid">
gantt
    title Phased Migration Timeline
    dateFormat  YYYY-MM-DD
    section Phase 1
    Analytics Workloads    :done, p1, 2024-01-01, 30d
    section Phase 2
    ETL Pipelines          :active, p2, 2024-02-01, 45d
    section Phase 3
    ML Workloads           :p3, 2024-03-15, 30d
    section Phase 4
    BI & Reporting         :p4, 2024-04-15, 30d
</div>
<script type="module"> import mermaid from "https://cdn.jsdelivr.net/npm/mermaid@10/dist/mermaid.esm.min.mjs"; mermaid.initialize({ startOnLoad: true, theme: "default" }); </script>

### Phase Selection Criteria

| Criteria | Phase 1 (Pilot) | Phase 2-3 (Core) | Phase 4+ (Tail) |
|----------|-----------------|------------------|------------------|
| **Complexity** | Low | Medium-High | Any |
| **Business Impact** | Low-Medium | High | Variable |
| **Dependencies** | Minimal | Moderate | Complex |
| **Team Readiness** | Champions | Broader team | All teams |

## Rollback Planning

Every cutover must have a documented rollback plan. Define triggers, procedures, and responsibilities before the cutover window.

<br />
<div class="mermaid">
flowchart TB
    CUTOVER["Cutover Executed"] --> MONITOR["Monitor for Issues"]
    MONITOR --> |"No Issues"| SUCCESS["✓ Cutover Successful"]
    MONITOR --> |"Issues Detected"| ASSESS{"Severity?"}
    ASSESS --> |"Critical"| ROLLBACK["Immediate Rollback"]
    ASSESS --> |"Medium"| DECIDE{"Fixable in Window?"}
    ASSESS --> |"Low"| HOTFIX["Apply Hotfix"]
    DECIDE --> |"Yes"| FIX["Apply Fix"]
    DECIDE --> |"No"| ROLLBACK
    ROLLBACK --> RESTORE["Restore {SOURCE_PLATFORM}"]
    RESTORE --> POSTMORTEM["Post-Incident Review"]
    style SUCCESS fill:#e8f5e9,stroke:#4caf50
    style ROLLBACK fill:#ffebee,stroke:#f44336
    style RESTORE fill:#fff3e0,stroke:#ff9800
</div>
<script type="module"> import mermaid from "https://cdn.jsdelivr.net/npm/mermaid@10/dist/mermaid.esm.min.mjs"; mermaid.initialize({ startOnLoad: true, theme: "default" }); </script>

### Rollback Triggers

Define clear criteria that trigger an automatic rollback:

| Trigger | Threshold | Action |
|---------|-----------|--------|
| **Error Rate** | > 1% for 5 minutes | Automatic rollback |
| **Latency** | > 2x baseline for 10 minutes | Alert + manual decision |
| **Data Freshness** | > 30 minutes stale | Alert + investigate |
| **Business Validation** | Critical report failure | Manual rollback |

<div style="border-left: 4px solid #f44336; background: #ffebee; padding: 16px 20px; border-radius: 4px; margin: 16px 0;">
    <div style="display: flex; align-items: flex-start; gap: 12px;">
        <span style="font-size: 24px;">⚠️</span>
        <div>
            <strong style="color: #c62828; font-size: 1.1em;">Rollback Time Limit</strong>
            <p style="margin: 8px 0 0 0; color: #333;">
                For blue-green deployments, keep {SOURCE_PLATFORM} warm for at least 24-48 hours post-cutover. After this window, rollback becomes significantly more complex due to data divergence.
            </p>
        </div>
    </div>
</div>

## Cutover Runbook

A detailed runbook ensures consistent execution. This example shows a blue-green cutover procedure.

<div class="code-block" data-language="markdown">
# Cutover Runbook: {SOURCE_PLATFORM} to Databricks

## Pre-Cutover (T-24 hours)
- [ ] Confirm validation tests passing
- [ ] Notify stakeholders of cutover window
- [ ] Verify rollback procedures tested
- [ ] Confirm on-call team availability

## Cutover Window (T-0)
- [ ] T-0:00 - Announce cutover start
- [ ] T-0:05 - Enable source write freeze
- [ ] T-0:10 - Run final delta sync
- [ ] T-0:30 - Validate data parity
- [ ] T-0:45 - Update DNS/connection strings
- [ ] T-1:00 - Run smoke tests on Databricks
- [ ] T-1:15 - Confirm consumer connectivity
- [ ] T-1:30 - Declare cutover complete or rollback

## Post-Cutover (T+24 hours)
- [ ] Monitor dashboards continuously
- [ ] Address user-reported issues
- [ ] Validate SLA compliance
- [ ] Obtain business sign-off
- [ ] Begin {SOURCE_PLATFORM} decommission planning
</div>

<link href="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/themes/prism.min.css" rel="stylesheet" />
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/prism.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/components/prism-markdown.min.js"></script>

<script>
(function() {
    document.querySelectorAll('.code-block').forEach(function(block) {
        var lang = block.getAttribute('data-language') || 'markdown';
        var code = block.textContent.trim();
        var id = 'code-' + Math.random().toString(36).substr(2, 9);
        
        block.innerHTML = 
            '<div style="position:relative;margin:16px 0;">' +
                '<button class="copy-btn" style="position:absolute;top:8px;right:8px;padding:4px 12px;font-size:12px;background:#ddd;color:#333;border:1px solid #ccc;border-radius:4px;cursor:pointer;z-index:10;">Copy</button>' +
                '<pre style="background:#f8f8f8;border-radius:8px;padding:16px;padding-top:40px;overflow-x:auto;margin:0;border:1px solid #e0e0e0;"><code id="' + id + '" class="language-' + lang + '" style="font-family:Consolas,Monaco,monospace;font-size:14px;"></code></pre>' +
            '</div>';
        
        var codeEl = document.getElementById(id);
        codeEl.textContent = code;
        Prism.highlightElement(codeEl);
        
        block.querySelector('.copy-btn').onclick = function() {
            var t = document.createElement('textarea');
            t.value = code;
            document.body.appendChild(t);
            t.select();
            document.execCommand('copy');
            document.body.removeChild(t);
            this.textContent = '✓ Copied!';
            setTimeout(() => this.textContent = 'Copy', 2000);
        };
    });
})();
</script>

## Post-Cutover Validation

After cutover, systematically validate that all systems are functioning correctly before declaring success.

### Validation Checklist

| Area | Validation | Method |
|------|------------|--------|
| **Data Pipelines** | All jobs running on schedule | System Tables query |
| **Data Quality** | No anomalies detected | Lakehouse Monitoring |
| **BI Dashboards** | All reports loading correctly | Manual + automated tests |
| **API Consumers** | Response times within SLA | APM monitoring |
| **User Access** | All users can authenticate | Access audit logs |

### Post-Cutover Monitoring Query

<div class="code-block" data-language="sql">
-- Monitor job health post-cutover
SELECT 
    job_name,
    COUNT(*) AS total_runs,
    SUM(CASE WHEN result_state = 'SUCCESS' THEN 1 ELSE 0 END) AS successful,
    SUM(CASE WHEN result_state = 'FAILED' THEN 1 ELSE 0 END) AS failed,
    ROUND(AVG(DATEDIFF(SECOND, start_time, end_time)), 2) AS avg_duration_sec
FROM system.lakeflow.job_run_timeline
WHERE start_time >= CURRENT_TIMESTAMP - INTERVAL 24 HOURS
GROUP BY job_name
HAVING SUM(CASE WHEN result_state = 'FAILED' THEN 1 ELSE 0 END) > 0
ORDER BY failed DESC;

-- Check query performance vs baseline
WITH baseline AS (
    SELECT query_hash, AVG(total_duration_ms) AS baseline_duration
    FROM system.query.history
    WHERE start_time BETWEEN CURRENT_DATE - INTERVAL 7 DAYS AND CURRENT_DATE - INTERVAL 1 DAY
    GROUP BY query_hash
),
current AS (
    SELECT query_hash, AVG(total_duration_ms) AS current_duration
    FROM system.query.history
    WHERE start_time >= CURRENT_DATE
    GROUP BY query_hash
)
SELECT 
    c.query_hash,
    b.baseline_duration,
    c.current_duration,
    (c.current_duration - b.baseline_duration) / b.baseline_duration * 100 AS pct_change
FROM current c
JOIN baseline b ON c.query_hash = b.query_hash
WHERE c.current_duration > b.baseline_duration * 1.5  -- 50%+ slower
ORDER BY pct_change DESC;
</div>

<link href="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/themes/prism.min.css" rel="stylesheet" />
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/prism.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/components/prism-sql.min.js"></script>

<script>
(function() {
    document.querySelectorAll('.code-block').forEach(function(block) {
        var lang = block.getAttribute('data-language') || 'sql';
        var code = block.textContent.trim();
        var id = 'code-' + Math.random().toString(36).substr(2, 9);
        
        block.innerHTML = 
            '<div style="position:relative;margin:16px 0;">' +
                '<button class="copy-btn" style="position:absolute;top:8px;right:8px;padding:4px 12px;font-size:12px;background:#ddd;color:#333;border:1px solid #ccc;border-radius:4px;cursor:pointer;z-index:10;">Copy</button>' +
                '<pre style="background:#f8f8f8;border-radius:8px;padding:16px;padding-top:40px;overflow-x:auto;margin:0;border:1px solid #e0e0e0;"><code id="' + id + '" class="language-' + lang + '" style="font-family:Consolas,Monaco,monospace;font-size:14px;"></code></pre>' +
            '</div>';
        
        var codeEl = document.getElementById(id);
        codeEl.textContent = code;
        Prism.highlightElement(codeEl);
        
        block.querySelector('.copy-btn').onclick = function() {
            var t = document.createElement('textarea');
            t.value = code;
            document.body.appendChild(t);
            t.select();
            document.execCommand('copy');
            document.body.removeChild(t);
            this.textContent = '✓ Copied!';
            setTimeout(() => this.textContent = 'Copy', 2000);
        };
    });
})();
</script>

## Summary

### Cutover Checklist

- [ ] Select cutover strategy (blue-green, canary, phased, big bang)
- [ ] Document rollback triggers and procedures
- [ ] Create detailed runbook with timing and owners
- [ ] Communicate cutover window to stakeholders
- [ ] Execute cutover with real-time monitoring
- [ ] Validate all systems post-cutover
- [ ] Obtain business sign-off before closing rollback window

### Strategy Selection Guide

| If You Need... | Use This Strategy |
|----------------|-------------------|
| Zero downtime, instant rollback | Blue-Green |
| Gradual rollout with monitoring | Canary |
| Incremental migration by workload | Phased |
| Simplest execution, downtime acceptable | Big Bang |

### Next Steps

With cutover complete, move to the Enable phase:

- [**5.1 - Platform Operations and Cost Management**]($./../05 - Enable/5.1 - Platform Operations and Cost Management) - Operationalize your Databricks environment

<div style="color: #FF3621; font-weight: bold; font-size: 2em; margin-bottom: 12px;">COURSE DEVELOPER (remove before publishing)</div>

### Template Customization

**Placeholders to replace:**
- `{SOURCE_PLATFORM}` - Source platform name

**Platform-specific additions:**
- Add platform-specific connection string switching examples
- Document platform-specific rollback procedures
- Include platform-specific timing considerations

&copy; 2026 Databricks, Inc. All rights reserved. Apache, Apache Spark, Spark, the Spark Logo, Apache Iceberg, Iceberg, and the Apache Iceberg logo are trademarks of the <a href="https://www.apache.org/" target="_blank">Apache Software Foundation</a>.<br/><br/><a href="https://databricks.com/privacy-policy" target="_blank">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use" target="_blank">Terms of Use</a> | <a href="https://help.databricks.com/" target="_blank">Support</a>
