<div style="display: flex; justify-content: space-between; align-items: center; padding: 8px 16px; background: #F8F9FA; border-bottom: 2px solid #E0E0E0; margin: 0; line-height: 1;">
    <div style="font-size: 14px; color: #666;">
        <span style="font-weight: bold; color: #333;">{SOURCE_PLATFORM} → Databricks Migration</span>
        <span style="margin-left: 8px; color: #999;">|</span>
        <span style="margin-left: 8px;">02 - Design</span>
    </div>
    <div style="display: flex; align-items: center; gap: 8px;">
        <img src="https://cdn.simpleicons.org/snowflake/29B5E8" width="24" height="24"/>
        <span style="color: #999; font-size: 16px;">→</span>
        <img src="https://cdn.simpleicons.org/databricks/FF3621" width="24" height="24"/>
    </div>
</div>


<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img
    src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png"
    alt="Databricks Learning"
  >
</div>

# Platform Setup and Environment

## Overview

This module covers the foundational setup of your Databricks environment, including Unity Catalog configuration, workspace creation, compute setup, and establishing the infrastructure that will support your migration.

## Learning Objectives

By the end of this lesson, you will be able to:
- Enable and configure Unity Catalog metastore
- Create and organize workspaces
- Set up compute resources (SQL warehouses, clusters, instance pools)
- Configure service credentials for data access
- Establish naming conventions and organizational standards
- Bind workspaces to Unity Catalog

## Foundation Setup Overview

The platform foundation must be established before data migration begins. This is "Wave 0" from your migration planning.

<br />
<div class="mermaid">
flowchart LR
    subgraph SETUP["Platform Setup Sequence"]
        direction TB
        A["1. Enable<br/>Unity Catalog"]
        B["2. Create<br/>Metastore"]
        C["3. Create<br/>Workspaces"]
        D["4. Bind Workspace<br/>to Metastore"]
        E["5. Configure<br/>Compute"]
        F["6. Setup Service<br/>Credentials"]
    end
    A --> B --> C --> D --> E --> F
    style SETUP fill:#fff,stroke:#FF3621,stroke-width:2px
</div>
<script type="module"> import mermaid from "https://cdn.jsdelivr.net/npm/mermaid@10/dist/mermaid.esm.min.mjs"; mermaid.initialize({ startOnLoad: true, theme: "default" }); </script>

| Phase | Activity | Output |
|-------|----------|--------|
| **1. Unity Catalog** | Enable UC at account level | UC enabled for account |
| **2. Metastore** | Create metastore in target region | Centralized catalog and governance |
| **3. Workspaces** | Create development, staging, production workspaces | Isolated compute environments |
| **4. Binding** | Attach workspaces to metastore | Unified data access across environments |
| **5. Compute** | Configure SQL warehouses, clusters, policies | Ready for workload execution |
| **6. Credentials** | Setup IAM roles, service principals | Secure access to cloud storage |

## Unity Catalog Metastore Setup

Unity Catalog provides centralized governance across all workspaces. The metastore is the top-level container for all metadata.

### Step 1: Enable Unity Catalog

<div style="border-left: 4px solid #1976d2; background: #e3f2fd; padding: 12px 16px; border-radius: 4px; margin: 12px 0;">
    <strong>Account Admin Required:</strong> Only account admins can enable Unity Catalog
</div>

**Prerequisites:**
- E2 workspace tier (AWS/Azure) or Premium (GCP)
- Account admin privileges
- Cloud storage location for metastore

### Step 2: Create Metastore

<div class="code-block" data-language="bash">
# Using Databricks CLI
databricks unity-catalog metastores create \
  --name production_metastore \
  --storage-root s3://my-org-metastore/ \
  --region us-west-2
</div>

| Parameter | Description | Example |
|-----------|-------------|---------|
| `name` | Metastore identifier | `prod_metastore`, `dev_metastore` |
| `storage-root` | Cloud storage path | `s3://bucket/`, `abfss://container@account.dfs.core.windows.net/` |
| `region` | Cloud region | `us-west-2`, `eastus`, `europe-west1` |

### Step 3: Assign Metastore Admin

<div class="code-block" data-language="sql">
-- Grant metastore admin to user or group
GRANT ALL PRIVILEGES ON METASTORE TO `metastore-admins`;
</div>

> **Best Practice:** Use groups for metastore admin assignment, not individual users.

<link href="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/themes/prism.min.css" rel="stylesheet" />
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/prism.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/components/prism-sql.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/components/prism-bash.min.js"></script>

<script>
(function() {
    document.querySelectorAll('.code-block').forEach(function(block) {
        var lang = block.getAttribute('data-language') || 'sql';
        var code = block.textContent.trim();
        var id = 'code-' + Math.random().toString(36).substr(2, 9);
        
        block.innerHTML = 
            '<div style="position:relative;margin:16px 0;">' +
                '<button class="copy-btn" style="position:absolute;top:8px;right:8px;padding:4px 12px;font-size:12px;background:#ddd;color:#333;border:1px solid #ccc;border-radius:4px;cursor:pointer;z-index:10;">Copy</button>' +
                '<pre style="background:#f8f8f8;border-radius:8px;padding:16px;padding-top:40px;overflow-x:auto;margin:0;border:1px solid #e0e0e0;"><code id="' + id + '" class="language-' + lang + '" style="font-family:Consolas,Monaco,monospace;font-size:14px;"></code></pre>' +
            '</div>';
        
        var codeEl = document.getElementById(id);
        codeEl.textContent = code;
        Prism.highlightElement(codeEl);
        
        block.querySelector('.copy-btn').onclick = function() {
            var t = document.createElement('textarea');
            t.value = code;
            document.body.appendChild(t);
            t.select();
            document.execCommand('copy');
            document.body.removeChild(t);
            this.textContent = '✓ Copied!';
            setTimeout(() => this.textContent = 'Copy', 2000);
        };
    });
})();
</script>

## Workspace Setup

Workspaces provide isolated compute environments. Organize by environment (dev/staging/prod) or by team/domain.

### Organizational Patterns

| Pattern | Structure | Use Case |
|---------|-----------|----------|
| **Environment-Based** | `dev-workspace`, `staging-workspace`, `prod-workspace` | Traditional SDLC with promotion |
| **Domain-Based** | `sales-workspace`, `marketing-workspace`, `finance-workspace` | Domain-oriented data teams |
| **Hybrid** | `sales-prod`, `sales-dev`, `marketing-prod` | Both domain separation and SDLC |

### Create Workspace

<div style="border-left: 4px solid #1976d2; background: #e3f2fd; padding: 12px 16px; border-radius: 4px; margin: 12px 0;">
    <strong>Account Console:</strong> Account → Workspaces → Create Workspace
</div>

| Setting | Recommendation |
|---------|----------------|
| **Name** | Use descriptive naming: `{org}-{env}-{purpose}` |
| **Cloud** | AWS, Azure, or GCP |
| **Region** | Match metastore region for lowest latency |
| **Pricing Tier** | Enterprise or Premium for UC support |
| **Network** | Private Link for production environments |

### Bind Workspace to Metastore

<div class="code-block" data-language="bash">
# Assign metastore to workspace
databricks unity-catalog metastores assign \
  --workspace-id 12345678901234 \
  --metastore-id abc-def-12345
</div>

> **Important:** Once bound, workspace users can access catalogs they have permissions for.

<link href="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/themes/prism.min.css" rel="stylesheet" />
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/prism.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/components/prism-sql.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/components/prism-bash.min.js"></script>

<script>
(function() {
    document.querySelectorAll('.code-block').forEach(function(block) {
        var lang = block.getAttribute('data-language') || 'sql';
        var code = block.textContent.trim();
        var id = 'code-' + Math.random().toString(36).substr(2, 9);
        
        block.innerHTML = 
            '<div style="position:relative;margin:16px 0;">' +
                '<button class="copy-btn" style="position:absolute;top:8px;right:8px;padding:4px 12px;font-size:12px;background:#ddd;color:#333;border:1px solid #ccc;border-radius:4px;cursor:pointer;z-index:10;">Copy</button>' +
                '<pre style="background:#f8f8f8;border-radius:8px;padding:16px;padding-top:40px;overflow-x:auto;margin:0;border:1px solid #e0e0e0;"><code id="' + id + '" class="language-' + lang + '" style="font-family:Consolas,Monaco,monospace;font-size:14px;"></code></pre>' +
            '</div>';
        
        var codeEl = document.getElementById(id);
        codeEl.textContent = code;
        Prism.highlightElement(codeEl);
        
        block.querySelector('.copy-btn').onclick = function() {
            var t = document.createElement('textarea');
            t.value = code;
            document.body.appendChild(t);
            t.select();
            document.execCommand('copy');
            document.body.removeChild(t);
            this.textContent = '✓ Copied!';
            setTimeout(() => this.textContent = 'Copy', 2000);
        };
    });
})();
</script>

## Catalog and Schema Creation

Organize data using Unity Catalog's three-level namespace: `catalog.schema.table`

### Catalog Strategy

| Strategy | Approach | Example |
|----------|----------|---------|
| **Environment** | One catalog per environment | `dev`, `staging`, `prod` |
| **Domain** | One catalog per business domain | `sales`, `marketing`, `finance` |
| **Medallion** | One catalog per layer | `bronze`, `silver`, `gold` |
| **Hybrid** | Combine approaches | `prod_sales`, `prod_marketing` |

### Create Catalog and Schemas

<div class="code-block" data-language="sql">
-- Create catalog
CREATE CATALOG IF NOT EXISTS prod_catalog
COMMENT 'Production data catalog';

-- Create schemas for medallion architecture
CREATE SCHEMA IF NOT EXISTS prod_catalog.bronze
COMMENT 'Raw ingested data';

CREATE SCHEMA IF NOT EXISTS prod_catalog.silver
COMMENT 'Cleansed and conformed data';

CREATE SCHEMA IF NOT EXISTS prod_catalog.gold
COMMENT 'Business-level aggregates';

-- Grant usage
GRANT USAGE ON CATALOG prod_catalog TO `data-engineers`;
GRANT USAGE ON SCHEMA prod_catalog.bronze TO `data-engineers`;
GRANT USAGE ON SCHEMA prod_catalog.silver TO `data-engineers`;
GRANT USAGE ON SCHEMA prod_catalog.gold TO `data-analysts`;
</div>

<link href="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/themes/prism.min.css" rel="stylesheet" />
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/prism.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/components/prism-sql.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/components/prism-bash.min.js"></script>

<script>
(function() {
    document.querySelectorAll('.code-block').forEach(function(block) {
        var lang = block.getAttribute('data-language') || 'sql';
        var code = block.textContent.trim();
        var id = 'code-' + Math.random().toString(36).substr(2, 9);
        
        block.innerHTML = 
            '<div style="position:relative;margin:16px 0;">' +
                '<button class="copy-btn" style="position:absolute;top:8px;right:8px;padding:4px 12px;font-size:12px;background:#ddd;color:#333;border:1px solid #ccc;border-radius:4px;cursor:pointer;z-index:10;">Copy</button>' +
                '<pre style="background:#f8f8f8;border-radius:8px;padding:16px;padding-top:40px;overflow-x:auto;margin:0;border:1px solid #e0e0e0;"><code id="' + id + '" class="language-' + lang + '" style="font-family:Consolas,Monaco,monospace;font-size:14px;"></code></pre>' +
            '</div>';
        
        var codeEl = document.getElementById(id);
        codeEl.textContent = code;
        Prism.highlightElement(codeEl);
        
        block.querySelector('.copy-btn').onclick = function() {
            var t = document.createElement('textarea');
            t.value = code;
            document.body.appendChild(t);
            t.select();
            document.execCommand('copy');
            document.body.removeChild(t);
            this.textContent = '✓ Copied!';
            setTimeout(() => this.textContent = 'Copy', 2000);
        };
    });
})();
</script>

## Compute Configuration

Configure SQL warehouses for BI/analytics and clusters for data engineering workloads.

### SQL Warehouse Setup

SQL warehouses provide serverless or classic compute for SQL workloads.

| Type | Use Case | Characteristics |
|------|----------|-----------------|
| **Serverless** | BI, ad-hoc analytics, dashboards | Instant startup, auto-scaling, fully managed |
| **Pro** | Production BI with consistent load | Customer-managed, predictable performance |
| **Classic** | Legacy compatibility | Customer-managed instances |

<div class="code-block" data-language="bash">
# Create serverless SQL warehouse
databricks sql-warehouses create \
  --name "Production BI Warehouse" \
  --cluster-size "2X-Small" \
  --enable-serverless-compute \
  --auto-stop-mins 10
</div>

### Cluster Configuration

For data engineering workloads using notebooks and Spark jobs:

| Cluster Type | Use Case | Configuration |
|--------------|----------|---------------|
| **All-Purpose** | Interactive development, notebooks | Standard or High Concurrency, auto-termination |
| **Jobs** | Scheduled pipelines, automation | Single node or multi-node, auto-terminates |
| **Instance Pool** | Faster startup for frequent jobs | Pre-warmed instances |

<div class="code-block" data-language="json">
{
  "cluster_name": "data-engineering-cluster",
  "spark_version": "14.3.x-scala2.12",
  "node_type_id": "i3.xlarge",
  "autoscale": {
    "min_workers": 2,
    "max_workers": 8
  },
  "auto_termination_minutes": 120,
  "spark_conf": {
    "spark.databricks.delta.preview.enabled": "true"
  }
}
</div>

### Cluster Policies

Enforce governance and cost controls using cluster policies:

<div class="code-block" data-language="json">
{
  "policy_name": "Standard Data Engineering Policy",
  "definition": {
    "spark_version": {
      "type": "fixed",
      "value": "14.3.x-scala2.12"
    },
    "node_type_id": {
      "type": "whitelist",
      "values": ["i3.xlarge", "i3.2xlarge", "i3.4xlarge"]
    },
    "autoscale.max_workers": {
      "type": "range",
      "minValue": 2,
      "maxValue": 20
    }
  }
}
</div>

<link href="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/themes/prism.min.css" rel="stylesheet" />
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/prism.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/components/prism-sql.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/components/prism-bash.min.js"></script>

<script>
(function() {
    document.querySelectorAll('.code-block').forEach(function(block) {
        var lang = block.getAttribute('data-language') || 'sql';
        var code = block.textContent.trim();
        var id = 'code-' + Math.random().toString(36).substr(2, 9);
        
        block.innerHTML = 
            '<div style="position:relative;margin:16px 0;">' +
                '<button class="copy-btn" style="position:absolute;top:8px;right:8px;padding:4px 12px;font-size:12px;background:#ddd;color:#333;border:1px solid #ccc;border-radius:4px;cursor:pointer;z-index:10;">Copy</button>' +
                '<pre style="background:#f8f8f8;border-radius:8px;padding:16px;padding-top:40px;overflow-x:auto;margin:0;border:1px solid #e0e0e0;"><code id="' + id + '" class="language-' + lang + '" style="font-family:Consolas,Monaco,monospace;font-size:14px;"></code></pre>' +
            '</div>';
        
        var codeEl = document.getElementById(id);
        codeEl.textContent = code;
        Prism.highlightElement(codeEl);
        
        block.querySelector('.copy-btn').onclick = function() {
            var t = document.createElement('textarea');
            t.value = code;
            document.body.appendChild(t);
            t.select();
            document.execCommand('copy');
            document.body.removeChild(t);
            this.textContent = '✓ Copied!';
            setTimeout(() => this.textContent = 'Copy', 2000);
        };
    });
})();
</script>

## Service Credentials and Access

Configure service principals and storage credentials for secure access to cloud storage.

### Service Principal Creation

<div class="code-block" data-language="bash">
# Create service principal
databricks service-principals create \
  --display-name "migration-service-principal"

# Generate OAuth secret
databricks service-principals create-token \
  --service-principal-id <sp-id> \
  --lifetime-seconds 31536000 \
  --comment "Migration automation"
</div>

### Storage Credential Setup (AWS Example)

<div class="code-block" data-language="sql">
-- Create storage credential
CREATE STORAGE CREDENTIAL migration_storage_credential
WITH (
  AWS_IAM_ROLE 'arn:aws:iam::123456789012:role/databricks-migration-role'
);

-- Grant usage to service principal
GRANT CREATE EXTERNAL LOCATION ON STORAGE CREDENTIAL migration_storage_credential 
TO `migration-service-principal`;
</div>

### External Location

<div class="code-block" data-language="sql">
-- Create external location for landing zone
CREATE EXTERNAL LOCATION landing_zone
URL 's3://my-org-landing-zone/'
WITH (STORAGE CREDENTIAL migration_storage_credential);

-- Grant access
GRANT READ FILES, WRITE FILES ON EXTERNAL LOCATION landing_zone 
TO `data-engineers`;
</div>

<link href="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/themes/prism.min.css" rel="stylesheet" />
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/prism.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/components/prism-sql.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/components/prism-bash.min.js"></script>

<script>
(function() {
    document.querySelectorAll('.code-block').forEach(function(block) {
        var lang = block.getAttribute('data-language') || 'sql';
        var code = block.textContent.trim();
        var id = 'code-' + Math.random().toString(36).substr(2, 9);
        
        block.innerHTML = 
            '<div style="position:relative;margin:16px 0;">' +
                '<button class="copy-btn" style="position:absolute;top:8px;right:8px;padding:4px 12px;font-size:12px;background:#ddd;color:#333;border:1px solid #ccc;border-radius:4px;cursor:pointer;z-index:10;">Copy</button>' +
                '<pre style="background:#f8f8f8;border-radius:8px;padding:16px;padding-top:40px;overflow-x:auto;margin:0;border:1px solid #e0e0e0;"><code id="' + id + '" class="language-' + lang + '" style="font-family:Consolas,Monaco,monospace;font-size:14px;"></code></pre>' +
            '</div>';
        
        var codeEl = document.getElementById(id);
        codeEl.textContent = code;
        Prism.highlightElement(codeEl);
        
        block.querySelector('.copy-btn').onclick = function() {
            var t = document.createElement('textarea');
            t.value = code;
            document.body.appendChild(t);
            t.select();
            document.execCommand('copy');
            document.body.removeChild(t);
            this.textContent = '✓ Copied!';
            setTimeout(() => this.textContent = 'Copy', 2000);
        };
    });
})();
</script>

## Naming Conventions and Standards

Establish consistent naming patterns for governance and discoverability.

### Recommended Naming Patterns

| Object Type | Pattern | Example |
|-------------|---------|---------|
| **Catalog** | `{env}_{domain}` | `prod_sales`, `dev_analytics` |
| **Schema** | `{layer}` or `{subdomain}` | `bronze`, `silver`, `gold`, `raw`, `curated` |
| **Table** | `{source}_{entity}` | `salesforce_accounts`, `stripe_invoices` |
| **View** | `{purpose}_view` or `v_{name}` | `active_customers_view`, `v_daily_sales` |
| **SQL Warehouse** | `{purpose}_{env}` | `bi_prod`, `analytics_dev` |
| **Cluster** | `{team}_{purpose}` | `eng_etl`, `ds_training` |
| **Job** | `{pipeline}_{frequency}` | `sales_etl_daily`, `inventory_sync_hourly` |

### Tagging Strategy

Apply tags for cost tracking, ownership, and governance:

| Tag Key | Purpose | Example Values |
|---------|---------|----------------|
| `env` | Environment | `dev`, `staging`, `prod` |
| `owner` | Team or person responsible | `data-engineering`, `analytics` |
| `cost-center` | Chargeback code | `CC-1234`, `sales-dept` |
| `project` | Migration wave or project | `wave1`, `crm-migration` |
| `criticality` | Business criticality | `critical`, `high`, `medium`, `low` |

## Summary

### Setup Checklist

| Task | Status |
|------|--------|
| Unity Catalog metastore created | ☐ |
| Workspaces created (dev, staging, prod) | ☐ |
| Workspaces bound to metastore | ☐ |
| Catalogs and schemas created | ☐ |
| SQL warehouses configured | ☐ |
| All-purpose and jobs clusters configured | ☐ |
| Cluster policies applied | ☐ |
| Service principals created | ☐ |
| Storage credentials configured | ☐ |
| External locations defined | ☐ |
| Naming conventions documented | ☐ |
| Tagging strategy applied | ☐ |

### Next Steps

- Proceed to [**2.3 - Storage and Governance Design**]($./2.3 - Storage and Governance Design) to configure data organization
- Proceed to [**2.4 - Security and Access Design**]($./2.4 - Security and Access Design) for RBAC and security policies

<div style="color: #FF3621; font-weight: bold; font-size: 2em; margin-bottom: 12px;">COURSE DEVELOPER (remove before publishing)</div>

### Template Customization

**Placeholders to replace:**
- `{SOURCE_PLATFORM}` - Source platform name
- Cloud-specific examples (AWS, Azure, GCP)

**Platform-specific additions:**
- Add cloud-specific IAM/RBAC setup guides
- Include region-specific considerations
- Reference cloud-specific networking (Private Link, VNet injection, etc.)
- Add cloud-specific storage credential examples

&copy; 2026 Databricks, Inc. All rights reserved. Apache, Apache Spark, Spark, the Spark Logo, Apache Iceberg, Iceberg, and the Apache Iceberg logo are trademarks of the <a href="https://www.apache.org/" target="_blank">Apache Software Foundation</a>.<br/><br/><a href="https://databricks.com/privacy-policy" target="_blank">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use" target="_blank">Terms of Use</a> | <a href="https://help.databricks.com/" target="_blank">Support</a>
