<div style="display: flex; justify-content: space-between; align-items: center; padding: 8px 16px; background: #F8F9FA; border-bottom: 2px solid #E0E0E0; margin: 0; line-height: 1;">
    <div style="font-size: 14px; color: #666;">
        <span style="font-weight: bold; color: #333;">{SOURCE_PLATFORM} â†’ Databricks Migration</span>
        <span style="margin-left: 8px; color: #999;">|</span>
        <span style="margin-left: 8px;">01 - Discover</span>
    </div>
    <div style="display: flex; align-items: center; gap: 8px;">
        <img src="https://cdn.simpleicons.org/snowflake/29B5E8" width="24" height="24"/>
        <span style="color: #999; font-size: 16px;">â†’</span>
        <img src="https://cdn.simpleicons.org/databricks/FF3621" width="24" height="24"/>
    </div>
</div>


<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img
    src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png"
    alt="Databricks Learning"
  >
</div>

# Migration Strategies

## Overview

Choosing the right migration strategy is critical for success. This module covers high-level approaches to migrating from **{SOURCE_PLATFORM}** to **Databricks**, helping you select the strategy that best fits your organization's needs, risk tolerance, and timeline.

## Learning Objectives

By the end of this lesson, you will be able to:
- Evaluate different migration strategies and their tradeoffs
- Select the appropriate approach for your organization
- Understand phased vs big-bang migration considerations
- Apply a prioritization framework for workload migration

## Migration Prioritization

There are two primary migration strategies when moving from {SOURCE_PLATFORM} to Databricks. The choice between them depends on your organization's main driver for migration:

| Strategy | Primary Driver | Approach |
|----------|---------------|----------|
| **ETL-First** | Cost reduction | Migrate data pipelines first, keep {SOURCE_PLATFORM} for serving temporarily |
| **BI-First** | New capabilities | Migrate analytics/reporting first, keep {SOURCE_PLATFORM} for ETL temporarily |

Both strategies result in full migration to Databricks - they differ in **what you migrate first** and **where you see value soonest**.

### Strategy Selection Factors

The right migration approach depends on your organization's priorities and constraints. Key factors include:

| Factor | Considerations |
|--------|----------------|
| **Urgency and timelines** | Contract deadlines, budget cycles, executive mandates |
| **Workload dependencies** | Integrated vs isolated pipelines |
| **Current architecture** | Shared vs isolated warehouses, existing limitations |
| **Business requirements** | Road map backlog, new capability needs |
| **Migration resources** | Team availability, partner assistance, tooling access |

If your primary pain point is **compute costs**, ETL-First typically delivers faster savings. If you need **AI/ML capabilities** or want to break data silos, BI-First unlocks value sooner.

<div class="mermaid">
flowchart LR
    A["What is your<br/>primary driver?"] --> B{"Cost<br/>reduction?"}
    B -->|Yes| C["<b>ETL-First</b>"]
    B -->|No| D{"New capabilities<br/>(AI/ML)?"}
    D -->|Yes| E["<b>BI-First</b>"]
    style C fill:#e8f5e9,stroke:#4caf50,stroke-width:2px
    style E fill:#e8f5e9,stroke:#4caf50,stroke-width:2px
</div>
<script type="module"> import mermaid from "https://cdn.jsdelivr.net/npm/mermaid@10/dist/mermaid.esm.min.mjs"; mermaid.initialize({ startOnLoad: true, theme: "default" }); </script>

## ETL-First

Lead with migrating data ingestion and transformation workloads. Land all data in cloud storage in Delta Lake or Iceberg format, perform ETL using Databricks, and sync processed data back to {SOURCE_PLATFORM} for serving until consumers are migrated.  

This approach allows continued use of {SOURCE_PLATFORM} in the interim for serving downstream applications and BI dashboards, minimizing disruption to end users.

<br />
<div class="mermaid">
flowchart LR
    subgraph ETL["<b>ETL-First</b>"]
        direction LR
        E1["Data<br/>Sources"] --> E2["<b>Databricks</b><br/>Bronze/Silver/<i>Gold</i>"]
        E2 --> E3["{SOURCE_PLATFORM}<br/>Serving/<i>Gold</i>"]
        E3 --> E4["BI / Apps"]
    end
    style ETL fill:#fff,stroke:#FF3621,stroke-width:2px
</div>
<script type="module"> import mermaid from "https://cdn.jsdelivr.net/npm/mermaid@10/dist/mermaid.esm.min.mjs"; mermaid.initialize({ startOnLoad: true, theme: "default" }); </script>

### When to Use

| Scenario | Fit |
|----------|-----|
| ETL/compute costs are the primary cost driver | Excellent |
| Complex pipelines with known performance issues | Good |
| BI tools are difficult to repoint quickly | Good |
| Need to demonstrate quick cost savings | Excellent |

### Tradeoffs

| ðŸŸ¢ Advantages | ðŸŸ¡ Considerations |
|------------|----------------|
| Fastest path to cost savings | Requires reverse sync to {SOURCE_PLATFORM} temporarily |
| Lower risk - BI unchanged initially | Temporary duplicate data between platforms |
| Validates Databricks ETL capability early | Two systems to maintain during transition |
| Allows team to learn gradually | More complex intermediate architecture |

### Steps

| Step | Action |
|------|--------|
| **1** | Ingest data from sources into Databricks using Auto Loader, Lakeflow Connect, or Spark Declarative Pipelines |
| **2** | Reverse sync Gold tables to {SOURCE_PLATFORM} via Delta Sharing, COPY INTO, or CDC |
| **3** | Validate data parity between platforms during parallel run |
| **4** | Migrate consumers to Databricks and decommission {SOURCE_PLATFORM} serving |

### Databricks Solutions Used

| Purpose | Options |
|---------|---------|
| **Ingestion** | Auto Loader, Lakeflow Connect, Spark Declarative Pipelines |
| **Transformation** | Spark Declarative Pipelines, Lakeflow Jobs, Databricks notebooks |
| **Reverse Sync** | Delta Sharing, Spark {SOURCE_PLATFORM} Connector, external stages |
| **Orchestration** | Lakeflow Jobs |

## BI-First

Lead with modernizing the reporting layer by replicating Gold/presentation tables from {SOURCE_PLATFORM} into Databricks. This unlocks new capabilities like AI/ML and cross-functional analytics while existing ETL continues running in {SOURCE_PLATFORM}.

<br />
<div class="mermaid">
flowchart LR
    subgraph BI["<b>BI-First</b>"]
        direction LR
        B1["Data<br/>Sources"] --> B2["{SOURCE_PLATFORM}<br/>ETL"]
        B2 --> B3["<b>Databricks</b><br/>Gold/Analytics"]
        B3 --> B4["BI / Apps"]
    end
    style BI fill:#fff,stroke:#FF3621,stroke-width:2px
</div>
<script type="module"> import mermaid from "https://cdn.jsdelivr.net/npm/mermaid@10/dist/mermaid.esm.min.mjs"; mermaid.initialize({ startOnLoad: true, theme: "default" }); </script>

### When to Use

| Scenario | Fit |
|----------|-----|
| New AI/ML capabilities are the primary driver | Excellent |
| Want to evaluate Databricks with real production data | Excellent |
| Current ETL is stable and working well | Good |
| BI performance improvements are needed | Good |

### Tradeoffs

| ðŸŸ¢ Advantages | ðŸŸ¡ Considerations |
|------------|----------------|
| Enables new capabilities (AI/ML) immediately | ETL costs unchanged initially |
| Lower risk - existing ETL unchanged | Requires data sync from {SOURCE_PLATFORM} |
| Quick demonstration of value to users | Two platforms maintained longer |
| Validates Databricks for analytics workloads | Delayed full platform consolidation |


### Steps

| Step | Action |
|------|--------|
| **1** | Continue existing ETL in {SOURCE_PLATFORM} |
| **2** | Sync Gold tables to Databricks via Delta Sharing, Spark Connector, or CDC |
| **3** | Repoint BI tools and enable AI/ML workloads on Databricks |
| **4** | Migrate ETL pipelines to Databricks and decommission {SOURCE_PLATFORM} |

### Databricks Solutions Used

| Purpose | Options |
|---------|---------|
| **Data Sync** | Delta Sharing, Spark {SOURCE_PLATFORM} Connector, Lakeflow Connect |
| **Analytics** | Databricks SQL, AI/BI Dashboards, Genie Spaces, SQL Warehouses |
| **AI/ML** | Mosaic AI, MLflow, Feature Store, Vector Search Indexes and Endpoints, Model Serving Endpoints|
| **Orchestration** | Lakeflow Jobs |

## Migration Approaches

Regardless of whether you choose ETL-First or BI-First, you must decide how to execute the migration: incrementally by workload or all at once.

<br />
<div class="mermaid">
flowchart LR
    A["How will you<br/>execute?"] --> B{"Complex<br/>environment?"}
    B -->|Yes| C["<b>Phased</b><br/>Workload-by-workload"]
    B -->|No| D{"Tight<br/>timeline?"}
    D -->|Yes| E["<b>Bulk</b><br/>Big-bang cutover"]
    D -->|No| C
    style C fill:#e8f5e9,stroke:#4caf50,stroke-width:2px
    style E fill:#fff3e0,stroke:#ff9800,stroke-width:2px
</div>
<script type="module"> import mermaid from "https://cdn.jsdelivr.net/npm/mermaid@10/dist/mermaid.esm.min.mjs"; mermaid.initialize({ startOnLoad: true, theme: "default" }); </script>


| Approach | Best For |
|----------|----------|
| **Phased** | Complex environments, many dependencies, risk-averse organizations, limited migration experience, business-critical workloads |
| **Bulk** | Simple environments, few dependencies, tight deadlines, small data volumes, dev/test environments |

**Comparing Phased vs Bulk Migration Approaches**
| Factor | Phased Migration | Bulk Migration |
|--------|------------------|----------------|
| **Risk** | Lower - issues contained | Higher - larger blast radius |
| **Duration** | Longer overall | Shorter cutover |
| **Complexity** | Higher (parallel systems) | Lower (single transition) |
| **Rollback** | Easier per phase | More difficult |
| **Cost** | Higher (overlap period) | Lower (no overlap) |

> **Recommendation**: A phased migration is generally recommended to mitigate risks and demonstrate progress early. Start with quick wins to build confidence before tackling complex workloads.

### Workload Prioritization

For phased migrations, prioritize workloads based on business value and migration complexity. Score each workload, plot it on the matrix, then sequence into waves.

<br />
<div class="mermaid">
quadrantChart
    title Prioritization Matrix
    x-axis Low Complexity --> High Complexity
    y-axis Low Value --> High Value
    quadrant-1 Major Projects
    quadrant-2 Quick Wins
    quadrant-3 Fill-ins
    quadrant-4 Consider Retiring
    Do these first: [0.30, 0.82]
    Plan carefully: [0.68, 0.71]
    Do these later: [0.22, 0.33]
    Dont migrate: [0.78, 0.19]
</div>
<script type="module"> import mermaid from "https://cdn.jsdelivr.net/npm/mermaid@10/dist/mermaid.esm.min.mjs"; mermaid.initialize({ startOnLoad: true, theme: "default" }); </script>

### Workload Sequencing

For phased migrations, prioritize workloads based on value and complexity:

| Wave | Focus | Objective |
|------|-------|-----------|
| **Wave 1** | Quick Wins | Low complexity, high visibility - build confidence and patterns |
| **Wave 2** | Core Workloads | Medium complexity, high value - apply learnings from Wave 1 |
| **Wave 3** | Complex Workloads | High complexity, high value - experienced team, established patterns |
| **Wave 4** | Cleanup | Remaining workloads - consider retiring low-value items |

## Summary and Key Takeaways

### Strategy Options

| Strategy | Best For | Risk Level |
|----------|----------|------------|
| **ETL-First** | Cost-driven migrations | Medium |
| **BI-First** | Capability-driven migrations | Medium |

### Migration Approaches

| Approach | Best For | Risk Level |
|----------|----------|------------|
| **Workload-by-Workload** | Complex, isolated workloads | Low |
| **Big-Bang** | Simple, time-constrained | High |

### Key Decisions

1. What is your primary migration driver?
2. How complex is your environment?
3. What is your risk tolerance?
4. What resources are available?

### Next Steps

- Proceed to [**1.3 - Discovery Checklist**]($./1.3 - Discovery Checklist) for migration requirements gathering

&copy; 2026 Databricks, Inc. All rights reserved. Apache, Apache Spark, Spark, the Spark Logo, Apache Iceberg, Iceberg, and the Apache Iceberg logo are trademarks of the <a href="https://www.apache.org/" target="_blank">Apache Software Foundation</a>.<br/><br/><a href="https://databricks.com/privacy-policy" target="_blank">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use" target="_blank">Terms of Use</a> | <a href="https://help.databricks.com/" target="_blank">Support</a>
