<div style="display: flex; justify-content: space-between; align-items: center; padding: 8px 16px; background: #F8F9FA; border-bottom: 2px solid #E0E0E0; margin: 0; line-height: 1;">
    <div style="font-size: 14px; color: #666;">
        <span style="font-weight: bold; color: #333;">{SOURCE_PLATFORM} → Databricks Migration</span>
        <span style="margin-left: 8px; color: #999;">|</span>
        <span style="margin-left: 8px;">03 - Execute</span>
    </div>
    <div style="display: flex; align-items: center; gap: 8px;">
        <img src="https://cdn.simpleicons.org/snowflake/29B5E8" width="24" height="24"/>
        <span style="color: #999; font-size: 16px;">→</span>
        <img src="https://cdn.simpleicons.org/databricks/FF3621" width="24" height="24"/>
    </div>
</div>


<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img
    src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png"
    alt="Databricks Learning"
  >
</div>

# Interoperability Patterns

## Overview

During migration, **{SOURCE_PLATFORM}** and **Databricks** often need to run in parallel. This module introduces data interoperability patterns that enable safe coexistence, leveraging open table formats and Unity Catalog's federation capabilities to minimize risk and duplication.

## Learning Objectives

By the end of this lesson, you will be able to:
- Define data interoperability and understand its business value
- Identify common interoperability challenges and how open formats solve them
- Configure External Access via Iceberg REST API for external engines to query Unity Catalog
- Implement Lakehouse Federation to query external catalogs from Databricks
- Select the appropriate coexistence pattern for your migration
- Evaluate risk profiles and plan transition timelines

<div style="border-left: 4px solid #1976d2; background: #e3f2fd; padding: 16px 20px; border-radius: 4px; margin: 16px 0;">
    <div style="display: flex; align-items: flex-start; gap: 12px;">
        <span style="font-size: 24px;">ℹ️</span>
        <div>
            <strong style="color: #0d47a1; font-size: 1.1em;">Coexistence may not be required</strong>
            <p style="margin: 8px 0 0 0; color: #333;">
                For smaller or less complex migrations, coexistence can often be bypassed. If your organization can execute a rapid cutover with acceptable risk, consider moving directly to the <strong>Migrate</strong> and <strong>Activate</strong> phases.
            </p>
        </div>
    </div>
</div>

## Why Coexistence is Often Necessary

Most enterprise migrations cannot happen overnight. Understanding the factors that drive coexistence helps you plan appropriately.

| Factor | Implication |
|--------|-------------|
| **Complex Dependencies** | Workloads with upstream/downstream relationships must migrate in sequence |
| **Validation Requirements** | Business-critical data needs parallel validation before cutover |
| **Business Continuity** | Production systems cannot tolerate disruption |
| **Team Capacity** | Phased approach spreads work and builds expertise |
| **Risk Management** | Rollback capability needed until confidence is established |

### Typical Coexistence Duration

The length of coexistence depends on migration scope and complexity:

| Migration Size | Object Count | Typical Duration | Recommended Approach |
|----------------|--------------|------------------|----------------------|
| **Small** | < 50 objects | 1-2 months | Consider skipping coexist |
| **Medium** | 50-500 objects | 3-6 months | Targeted coexistence |
| **Large** | 500+ objects | 6-12+ months | Full coexistence patterns |

> **Tip**: Use the complexity scores to estimate duration more accurately.

## Common Interoperability Challenges

Before open table formats, cross-platform data access faced significant obstacles that increased cost and complexity.

| Challenge | Impact | Traditional Workaround |
|-----------|--------|------------------------|
| **Format Incompatibility** | Proprietary formats create vendor lock-in | Export/import with data loss |
| **Schema Evolution** | Different platforms handle schema changes differently | Manual reconciliation |
| **Access Barriers** | Authentication complexity across systems | Multiple credentials, security gaps |
| **Governance Fragmentation** | Policies don't cross platform boundaries | Duplicate policy management |
| **Performance Bottlenecks** | Cross-platform queries are slow | Data duplication |

Open table formats like **Delta Lake** and **Apache Iceberg**, combined with **Unity Catalog**, solve these challenges at the architecture level.

## Interoperability Patterns Overview

There are two primary patterns for data interoperability during migration. The key distinction is **who initiates the connection** and **where compute runs**.

| Pattern | Direction | Description | Compute Location | Access | Authentication |
|---------|-----------|-------------|------------------|--------|----------------|
| **Pattern 1: External Access via Iceberg REST** | External → Databricks | External engines (Snowflake, EMR, Trino) query Unity Catalog tables | External engine | Read/Write (permission dependent) | OAuth via Service Principal |
| **Pattern 2: Lakehouse Federation** | Databricks → External | Databricks queries foreign catalogs (HMS, Glue, Snowflake) | Databricks | Read-only | Service Credentials |

Choose your interoperability pattern based on migration drivers and constraints. The right pattern balances risk, cost, and operational complexity.

# Pattern 1: External Access via Iceberg REST

## Overview

This pattern enables external compute engines like **{SOURCE_PLATFORM}** to query and write to Unity Catalog-managed tables via the **Iceberg REST API**. Databricks owns the data and governance; external engines provide compute.

### Centralized Governance with External Engine Compute

External compute engines (Snowflake, EMR, Trino) query and write to Unity Catalog-managed tables via the Iceberg REST API. Databricks owns the data and governance; external engines provide compute.

<br />

<div id="c4-external-access-diagram"></div>

<script>
(function() {
  const puml = `@startuml
!include https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Context.puml

HIDE_STEREOTYPE()

title Centralized Governance with UC and External Engine Compute

Person(engineers, "Data Engineers", "Develop and maintain pipelines")
Person(analysts, "Data Analysts", "Query and analyze data")

System_Ext(external, "External Compute Engine", "Snowflake, AWS EMR, Trino, etc.")

System(uc, "Unity Catalog", "Central governance and metadata management layer")

System_Ext(storage, "Managed Iceberg Tables", "Data stored in cloud storage")

Rel(engineers, external, "Implements pipelines using")
Rel(analysts, external, "Performs analytics using")
Rel(external, uc, "Gets metadata and permissions from")
Rel(engineers, uc, "Leverages for governance")
Rel(uc, storage, "Governs and tracks")
Rel(external, storage, "Reads/writes data to")

@enduml`;

  const encoded = Array.from(new TextEncoder().encode(puml))
    .map(b => b.toString(16).padStart(2, '0'))
    .join('');

  const img = document.createElement('img');
  img.src = `https://www.plantuml.com/plantuml/svg/~h${encoded}`;
  img.alt = 'Centralized Governance with External Engine Compute';
  img.style.maxWidth = '100%';
  document.getElementById('c4-external-access-diagram').appendChild(img);
})();
</script>

### Pattern 1 Summary

| Aspect | Details |
|--------|---------|
| **Direction** | External → Databricks |
| **Data Ownership** | Databricks (Unity Catalog) |
| **Compute** | External engine (Snowflake, EMR, Trino) |
| **Access Mode** | Read/Write (permission dependent) |
| **Authentication** | OAuth via Service Principal |
| **Format** | Iceberg native, Delta via UniForm |

**Use when:** You want Databricks as the system of record with external engines querying or writing to governed datasets.

### How It Works

1. **Service Principal** - Create OAuth credentials in Databricks for external authentication
2. **Iceberg REST API** - Unity Catalog exposes tables via standard Iceberg REST protocol
3. **Vended Credentials** - UC provides temporary cloud storage credentials to external engines
4. **UniForm** - Delta tables automatically generate Iceberg metadata for compatibility

## Pattern 1 Use Cases

| Use Case | Description |
|----------|-------------|
| **BI Tool Continuity** | {SOURCE_PLATFORM} dashboards continue to work during migration by querying UC tables |
| **Multi-Engine Analytics** | Data scientists use {SOURCE_PLATFORM} for SQL while ML engineers use Databricks |
| **Gradual Consumer Migration** | Migrate data to Databricks first, move consumers later |
| **External Data Sharing** | Partners query governed datasets without Databricks access |
| **Hybrid Workloads** | Some processing in {SOURCE_PLATFORM}, some in Databricks, same data |

## Pattern 1 Implementation Steps

### Step 1: Create a Service Principal in Databricks

<div style="border-left: 4px solid #1976d2; background: #e3f2fd; padding: 12px 16px; border-radius: 4px; margin: 12px 0;">
    <strong>UI Steps:</strong> Settings → Identity and Access Management → Service Principals → Add New
</div>

<div class="code-block" data-language="sql">
-- Grant permissions to the service principal
GRANT USAGE ON CATALOG {catalog_name} TO `{service_principal_name}`;
GRANT USAGE ON SCHEMA {catalog_name}.{schema_name} TO `{service_principal_name}`;
GRANT SELECT, MODIFY ON ALL TABLES IN SCHEMA {catalog_name}.{schema_name} TO `{service_principal_name}`;
</div>

### Step 2: Enable UniForm on Delta Tables (if needed)

<div class="code-block" data-language="sql">
ALTER TABLE {catalog_name}.{schema_name}.{table_name} 
SET TBLPROPERTIES (
  'delta.universalFormat.enabledFormats' = 'iceberg',
  'delta.enableIcebergCompatV2' = 'true'
);
</div>

### Step 3: Create Catalog Integration in {SOURCE_PLATFORM}

<div class="code-block" data-language="sql">
-- REPLACEME: {SOURCE_PLATFORM}-specific catalog integration
-- CREATE CATALOG INTEGRATION using Iceberg REST config
</div>

### Step 4: Define Tables in {SOURCE_PLATFORM}

<div class="code-block" data-language="sql">
-- REPLACEME: {SOURCE_PLATFORM}-specific table definition
-- CREATE ICEBERG TABLE referencing the catalog integration
</div>

### Step 5: Query UC Tables from {SOURCE_PLATFORM}

<div class="code-block" data-language="sql">
-- REPLACEME: {SOURCE_PLATFORM}-specific query examples
-- SELECT * FROM ...
-- INSERT INTO ...
</div>

<link href="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/themes/prism.min.css" rel="stylesheet" />
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/prism.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/components/prism-sql.min.js"></script>

<script>
(function() {
    document.querySelectorAll('.code-block').forEach(function(block) {
        var lang = block.getAttribute('data-language') || 'sql';
        var code = block.textContent.trim();
        var id = 'code-' + Math.random().toString(36).substr(2, 9);
        
        block.innerHTML = 
            '<div style="position:relative;margin:16px 0;">' +
                '<button class="copy-btn" style="position:absolute;top:8px;right:8px;padding:4px 12px;font-size:12px;background:#ddd;color:#333;border:1px solid #ccc;border-radius:4px;cursor:pointer;z-index:10;">Copy</button>' +
                '<pre style="background:#f8f8f8;border-radius:8px;padding:16px;padding-top:40px;overflow-x:auto;margin:0;border:1px solid #e0e0e0;"><code id="' + id + '" class="language-' + lang + '" style="font-family:Consolas,Monaco,monospace;font-size:14px;"></code></pre>' +
            '</div>';
        
        var codeEl = document.getElementById(id);
        codeEl.textContent = code;
        Prism.highlightElement(codeEl);
        
        block.querySelector('.copy-btn').onclick = function() {
            var t = document.createElement('textarea');
            t.value = code;
            document.body.appendChild(t);
            t.select();
            document.execCommand('copy');
            document.body.removeChild(t);
            this.textContent = '✓ Copied!';
            setTimeout(() => this.textContent = 'Copy', 2000);
        };
    });
})();
</script>

### Video Walkthrough

<a href="https://drive.google.com/file/d/1_aBJNbeYqL_-8esFtBrU5aKxuhIuflTR/view?usp=drive_link" target="_blank" style="text-decoration: none;">
    <div style="width: 100%; max-width: 640px; margin: 16px 0; padding: 40px; background: linear-gradient(135deg, #1a1a2e 0%, #16213e 100%); border-radius: 12px; cursor: pointer; text-align: center; box-shadow: 0 4px 12px rgba(0,0,0,0.2);">
        <div style="width: 80px; height: 80px; background: rgba(255,255,255,0.1); border-radius: 50%; margin: 0 auto 20px; display: flex; align-items: center; justify-content: center;">
            <div style="width: 0; height: 0; border-left: 30px solid #fff; border-top: 18px solid transparent; border-bottom: 18px solid transparent; margin-left: 8px;"></div>
        </div>
        <p style="color: #fff; font-size: 18px; margin: 0 0 8px 0; font-weight: 600;">Accessing UC Tables from Snowflake</p>
        <p style="color: #aaa; font-size: 14px; margin: 0;">Click to open video in new tab</p>
    </div>
</a>

# Pattern 2: Lakehouse Federation

## Overview

This pattern enables Databricks to query external catalogs like **{SOURCE_PLATFORM}**, Hive Metastore, or AWS Glue without moving data. The external system remains the source of record while Databricks provides compute and can join federated data with Unity Catalog tables.

### Federated Catalog Integration as a Migration Bridge

Databricks queries external catalogs (HMS, AWS Glue, Snowflake) via Lakehouse Federation without moving data. The external system remains the source of record during migration.

<br />

<div id="c4-federation-diagram"></div>

<script>
(function() {
  const puml = `@startuml
!include https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Context.puml

HIDE_STEREOTYPE()

title Migrating to UC using Federated Catalog Integration as a Bridge

Person(engineers, "Data Engineers", "Migrate and manage data assets")
Person(analysts, "Data Analysts", "Query data during migration")

System(uc, "Unity Catalog", "Target state: centralized governance layer")

System(federated, "Federated Catalog", "Bridge for migration")

System_Ext(storage, "Databricks Storage", "Target UC managed storage")

System_Ext(hms, "Hive Metastore", "Legacy catalog")
System_Ext(glue, "AWS Glue", "Legacy catalog")

System_Ext(legacy, "Legacy Data", "Data in existing storage")

Rel(engineers, uc, "Configures and migrates to")
Rel(analysts, uc, "Queries data through")
Rel(engineers, federated, "Sets up federation between catalogs")
Rel(uc, federated, "Federates with")
Rel(uc, storage, "Manages")
Rel(federated, hms, "Connects to")
Rel(federated, glue, "Connects to")
Rel(hms, legacy, "References")
Rel(glue, legacy, "References")

@enduml`;

  const encoded = Array.from(new TextEncoder().encode(puml))
    .map(b => b.toString(16).padStart(2, '0'))
    .join('');

  const img = document.createElement('img');
  img.src = `https://www.plantuml.com/plantuml/svg/~h${encoded}`;
  img.alt = 'Federated Catalog Integration as Migration Bridge';
  img.style.maxWidth = '100%';
  document.getElementById('c4-federation-diagram').appendChild(img);
})();
</script>

### Pattern 2 Summary

| Aspect | Details |
|--------|---------|
| **Direction** | Databricks → External |
| **Data Ownership** | External system (during migration) |
| **Compute** | Databricks |
| **Access Mode** | Read-only (query pushdown where possible) |
| **Authentication** | Service Credentials (IAM roles) |
| **Supported Sources** | HMS, AWS Glue, Snowflake, MySQL, PostgreSQL |

**Use when:** You need to query legacy data from Databricks during a phased migration without upfront data movement.

### How It Works

1. **Service Credential** - IAM role or secrets for authenticating to external catalog
2. **Storage Credential** - IAM role for accessing underlying data in cloud storage
3. **Connection** - Defines how Databricks connects to the external catalog
4. **Foreign Catalog** - Unity Catalog object that mirrors external catalog metadata

### Query Pushdown

Where possible, Databricks pushes predicates and aggregations to the external system to minimize data transfer.

## Pattern 2 Use Cases

| Use Case | Description |
|----------|-------------|
| **Query Before Migration** | Access {SOURCE_PLATFORM} data from Databricks without moving it first |
| **Cross-Platform Joins** | Join {SOURCE_PLATFORM} tables with UC tables in a single query |
| **Gradual Data Migration** | Query federated data while migrating tables incrementally |
| **Legacy System Access** | Maintain read access to legacy HMS or Glue catalogs |
| **Validation During Migration** | Compare source and target data using federated queries |

## Pattern 2 Implementation Steps

### Step 1: Create IAM Role for Catalog Access (AWS Example)

<div style="border-left: 4px solid #1976d2; background: #e3f2fd; padding: 12px 16px; border-radius: 4px; margin: 12px 0;">
    <strong>IAM Setup:</strong> Create trust policy for Unity Catalog with Glue permissions
</div>

<div class="code-block" data-language="json">
{
  "Effect": "Allow",
  "Action": [
    "glue:GetDatabase",
    "glue:GetDatabases", 
    "glue:GetTable",
    "glue:GetTables",
    "glue:GetPartition",
    "glue:GetPartitions"
  ],
  "Resource": "*"
}
</div>

### Step 2: Create Service Credential

<div class="code-block" data-language="bash">
databricks credentials create-credential \
--json '{
  "name": "{source_platform}_service_credential", 
  "purpose": "SERVICE", 
  "aws_iam_role": { 
    "role_arn": "arn:aws:iam::{account_id}:role/{role_name}"
  }
}'
</div>

### Step 3: Create Storage Credential (for data access)

<div class="code-block" data-language="bash">
databricks credentials create-credential \
--json '{
  "name": "{source_platform}_storage_credential", 
  "purpose": "STORAGE", 
  "aws_iam_role": { 
    "role_arn": "arn:aws:iam::{account_id}:role/{storage_role_name}"
  }
}'
</div>

### Step 4: Create External Location

<div class="code-block" data-language="bash">
databricks external-locations create \
  {location_name} \
  's3://{bucket_name}/' \
  {storage_credential_name} \
  --skip-validation
</div>

### Step 5: Create Connection

<div class="code-block" data-language="sql">
CREATE CONNECTION {source_platform}_connection
  TYPE {CONNECTION_TYPE}
  OPTIONS (
    -- REPLACEME: Source-specific connection options
  );
</div>

### Step 6: Create Foreign Catalog

<div class="code-block" data-language="sql">
CREATE FOREIGN CATALOG {foreign_catalog_name} 
  USING CONNECTION {source_platform}_connection
  OPTIONS (authorized_paths 's3://{bucket_name}');
</div>

### Step 7: Query Federated Data

<div class="code-block" data-language="sql">
-- Query external table
SELECT * FROM {foreign_catalog_name}.{schema_name}.{table_name};

-- Join with UC table
SELECT f.*, u.* 
FROM {foreign_catalog_name}.{schema}.{table} f
JOIN {uc_catalog}.{schema}.{table} u ON f.id = u.id;
</div>

<link href="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/themes/prism.min.css" rel="stylesheet" />
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/prism.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/components/prism-sql.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/components/prism-bash.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/components/prism-json.min.js"></script>

<script>
(function() {
    document.querySelectorAll('.code-block').forEach(function(block) {
        var lang = block.getAttribute('data-language') || 'sql';
        var code = block.textContent.trim();
        var id = 'code-' + Math.random().toString(36).substr(2, 9);
        
        block.innerHTML = 
            '<div style="position:relative;margin:16px 0;">' +
                '<button class="copy-btn" style="position:absolute;top:8px;right:8px;padding:4px 12px;font-size:12px;background:#ddd;color:#333;border:1px solid #ccc;border-radius:4px;cursor:pointer;z-index:10;">Copy</button>' +
                '<pre style="background:#f8f8f8;border-radius:8px;padding:16px;padding-top:40px;overflow-x:auto;margin:0;border:1px solid #e0e0e0;"><code id="' + id + '" class="language-' + lang + '" style="font-family:Consolas,Monaco,monospace;font-size:14px;"></code></pre>' +
            '</div>';
        
        var codeEl = document.getElementById(id);
        codeEl.textContent = code;
        Prism.highlightElement(codeEl);
        
        block.querySelector('.copy-btn').onclick = function() {
            var t = document.createElement('textarea');
            t.value = code;
            document.body.appendChild(t);
            t.select();
            document.execCommand('copy');
            document.body.removeChild(t);
            this.textContent = '✓ Copied!';
            setTimeout(() => this.textContent = 'Copy', 2000);
        };
    });
})();
</script>

## Next Steps

With interoperability patterns in place, you're ready to begin data replication:

- **[3.2 - Schema and DDL Conversion]($../03 - Execute/3.2 - Schema and DDL Conversion)** - Learn how to migrate data structures in preparation for replication and ingestion

<div style="color: #FF3621; font-weight: bold; font-size: 2em; margin-bottom: 12px;">COURSE DEVELOPER (remove before publishing)</div>

### Template Customization

**Placeholders to replace:**
- `{SOURCE_PLATFORM}` - Source platform name (Snowflake, BigQuery, Redshift)
- `{CONNECTION_TYPE}` - GLUE, SNOWFLAKE, MYSQL, POSTGRESQL, etc.
- `{catalog_name}`, `{schema_name}`, `{table_name}` - Example object names
- Header banner icon URL to match source platform

**Pattern 1 platform-specific implementation (Steps 3-5):**

| Platform | Catalog Integration | Table Definition |
|----------|---------------------|------------------|
| **Snowflake** | `CREATE CATALOG INTEGRATION ... CATALOG_SOURCE=ICEBERG_REST` | `CREATE ICEBERG TABLE ... CATALOG='integration_name'` |
| **EMR/Spark** | Spark config: `spark.sql.catalog.uc.catalog-impl=org.apache.iceberg.rest.RESTCatalog` | Standard Iceberg SQL |
| **Trino** | Iceberg connector config in catalog properties | Standard Iceberg SQL |

**Pattern 2 platform-specific connection options (Step 5):**

| Platform | Connection Type | Key Options |
|----------|-----------------|-------------|
| **AWS Glue** | `GLUE` | `aws_account_id`, `aws_region`, `credential` |
| **Snowflake** | `SNOWFLAKE` | `host`, `warehouse`, `user`, `password` |
| **HMS** | `HIVE_METASTORE` | `host`, `port` |
| **MySQL** | `MYSQL` | `host`, `port`, `user`, `password` |
| **PostgreSQL** | `POSTGRESQL` | `host`, `port`, `user`, `password` |

&copy; 2026 Databricks, Inc. All rights reserved. Apache, Apache Spark, Spark, the Spark Logo, Apache Iceberg, Iceberg, and the Apache Iceberg logo are trademarks of the <a href="https://www.apache.org/" target="_blank">Apache Software Foundation</a>.<br/><br/><a href="https://databricks.com/privacy-policy" target="_blank">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use" target="_blank">Terms of Use</a> | <a href="https://help.databricks.com/" target="_blank">Support</a>
