<div style="display: flex; justify-content: space-between; align-items: center; padding: 8px 16px; background: #F8F9FA; border-bottom: 2px solid #E0E0E0; margin: 0; line-height: 1;">
    <div style="font-size: 14px; color: #666;">
        <span style="font-weight: bold; color: #333;">{SOURCE_PLATFORM} → Databricks Migration</span>
        <span style="margin-left: 8px; color: #999;">|</span>
        <span style="margin-left: 8px;">01 - Discover</span>
    </div>
    <div style="display: flex; align-items: center; gap: 8px;">
        <img src="https://cdn.simpleicons.org/snowflake/29B5E8" width="24" height="24"/>
        <span style="color: #999; font-size: 16px;">→</span>
        <img src="https://cdn.simpleicons.org/databricks/FF3621" width="24" height="24"/>
    </div>
</div>


<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img
    src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png"
    alt="Databricks Learning"
  >
</div>

# Profiling and Complexity Scoring

## Overview

This module covers automated and manual techniques for profiling your **{SOURCE_PLATFORM}** environment and scoring workload complexity. These quantitative measures help prioritize migration efforts and estimate resources needed.

## Learning Objectives

By the end of this lesson, you will be able to:
- Use profiling tools to assess your environment
- Apply complexity scoring frameworks (T-shirt sizing)
- Generate migration effort estimates
- Identify high-risk components requiring special attention

## Profiling

When profiling the {SOURCE_PLATFORM} environment, you need to analyze and understand several different dimensions. Understanding all of these dimensions will help you prioritize and estimate and will ensure a successful migration.

<br />
<div class="mermaid">
flowchart LR
    subgraph TOOLS["Profiling Activities"]
        direction TB
        A["Control Plane<br/>Analysis"]
        B["Data Plane<br/>Profiling"]
        C["Job<br/>Analysis"]
        D["Query<br/>Analysis"]
        E["CI/CD<br/>Analysis"]
    end
    TOOLS --> F["Migration<br/>Scope"]
    style TOOLS fill:#fff,stroke:#FF3621,stroke-width:2px
</div>
<script type="module"> import mermaid from "https://cdn.jsdelivr.net/npm/mermaid@10/dist/mermaid.esm.min.mjs"; mermaid.initialize({ startOnLoad: true, theme: "default" }); </script>

| Activity | Focus | Output |
|----------|-------|--------|
| **Control Plane Analysis** | Account setup, IAM, identity, networking, landing zones | Infrastructure and security baseline |
| **Data Plane Profiling** | Schemas, tables, views, sizes, query patterns, compute usage | What data exists and how it's used |
| **Job Analysis** | Scheduled tasks, pipelines, dependencies, SLAs | What runs when and what depends on what |
| **Query Analysis** | Stored procedures, UDFs, SQL complexity | What needs conversion or rewrite |
| **CI/CD Analysis** | Deployment pipelines, environments, tooling | How changes get promoted |

These analysis activities will need to be performed at the system level, as well as looking at individual workloads and applications.

## Profiling Tools

Databricks recommends using automation tools to expedite gathering migration-related information. Manual discovery is error-prone and time-consuming.

| Tool | Purpose | Output |
|------|---------|--------|
| **Lakebridge** | End-to-end migration suite: profiler, converter, data migration, reconciler | Inventory, complexity scores, converted code |
| **BladeBridge Analyzer** | Code analysis and complexity assessment | Code inventory, complexity categorization, function usage |
| **Control Plane Tools** | CLIs, SDKs, StackQL, IaC code review (Terraform, CloudFormation, etc) | Account config, IAM, networking, resource inventory |
| **Auditing Queries** | System table queries for workload analysis | Query patterns, compute usage, cost attribution |
| **Custom Queries** | Direct queries against system catalogs | Object inventory, sizes, dependencies |

## Control Plane Analysis

Control plane analysis focuses on infrastructure, identity, and account configuration - the foundation that supports your data platform.

### What to Analyze

| Area | Key Questions | Tools |
|------|---------------|-------|
| **Account Structure** | How are accounts/organizations structured? Regions? | CLIs, Console, StackQL |
| **IAM & Identity** | Users, roles, service accounts, federation? | IAM APIs, IaC review |
| **Networking** | VPCs, private endpoints, firewall rules? | Network config exports |
| **Resource Configuration** | Warehouse sizes, cluster policies, quotas? | Platform APIs, StackQL |
| **Secrets & Credentials** | Where are secrets stored? Rotation policies? | Secrets manager inventory |

### Control Plane Queries

<div class="code-block" data-language="sql">
-- REPLACEME: Platform-specific control plane queries
-- Example: Query warehouse/compute configuration
SELECT 
    warehouse_name,
    warehouse_size,
    auto_suspend,
    auto_resume,
    resource_monitor
FROM information_schema.warehouses
ORDER BY warehouse_name;
</div>

### CLI Inventory

<div class="code-block" data-language="bash">
# REPLACEME: Platform-specific CLI commands
# Example: List compute resources
platform-cli warehouses list --format json

# Example: Export IAM configuration
platform-cli roles list --output roles.json
platform-cli users list --output users.json

# Example: Network configuration
platform-cli network-policies list
</div>

<link href="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/themes/prism.min.css" rel="stylesheet" />
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/prism.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/components/prism-sql.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/components/prism-bash.min.js"></script>

<script>
(function() {
    document.querySelectorAll('.code-block').forEach(function(block) {
        var lang = block.getAttribute('data-language') || 'sql';
        var code = block.textContent.trim();
        var id = 'code-' + Math.random().toString(36).substr(2, 9);
        
        block.innerHTML = 
            '<div style="position:relative;margin:16px 0;">' +
                '<button class="copy-btn" style="position:absolute;top:8px;right:8px;padding:4px 12px;font-size:12px;background:#ddd;color:#333;border:1px solid #ccc;border-radius:4px;cursor:pointer;z-index:10;">Copy</button>' +
                '<pre style="background:#f8f8f8;border-radius:8px;padding:16px;padding-top:40px;overflow-x:auto;margin:0;border:1px solid #e0e0e0;"><code id="' + id + '" class="language-' + lang + '" style="font-family:Consolas,Monaco,monospace;font-size:14px;"></code></pre>' +
            '</div>';
        
        var codeEl = document.getElementById(id);
        codeEl.textContent = code;
        Prism.highlightElement(codeEl);
        
        block.querySelector('.copy-btn').onclick = function() {
            var t = document.createElement('textarea');
            t.value = code;
            document.body.appendChild(t);
            t.select();
            document.execCommand('copy');
            document.body.removeChild(t);
            this.textContent = '✓ Copied!';
            setTimeout(() => this.textContent = 'Copy', 2000);
        };
    });
})();
</script>

## Data Plane Profiling

Data plane profiling examines the actual data assets - schemas, tables, volumes, and usage patterns.

### What to Profile

| Area | Key Questions | Output |
|------|---------------|--------|
| **Object Inventory** | Databases, schemas, tables, views? | Complete object catalog |
| **Storage Volumes** | Table sizes, row counts, growth rates? | Capacity planning data |
| **Data Types** | Complex types, semi-structured data? | Conversion requirements |
| **Partitioning** | Current partitioning strategies? | Target table design |
| **Access Patterns** | Which tables are queried most? | Prioritization input |

### Example: Object Inventory Query

<div class="code-block" data-language="sql">
-- REPLACEME: Platform-specific catalog query 
SELECT 
    table_catalog,
    table_schema,
    table_name,
    table_type,
    created,
    last_altered
FROM information_schema.tables
WHERE table_schema NOT IN ('INFORMATION_SCHEMA')
ORDER BY table_schema, table_name;
</div>

### Example: Storage Summary

<div class="code-block" data-language="sql">
-- REPLACEME: Platform-specific storage query
SELECT 
    table_schema,
    COUNT(*) AS table_count,
    SUM(row_count) AS total_rows,
    SUM(bytes) / (1024*1024*1024*1024) AS total_tb
FROM information_schema.tables
GROUP BY table_schema
ORDER BY total_tb DESC;
</div>

<link href="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/themes/prism.min.css" rel="stylesheet" />
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/prism.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/components/prism-sql.min.js"></script>

<script>
(function() {
    document.querySelectorAll('.code-block').forEach(function(block) {
        var lang = block.getAttribute('data-language') || 'sql';
        var code = block.textContent.trim();
        var id = 'code-' + Math.random().toString(36).substr(2, 9);
        
        block.innerHTML = 
            '<div style="position:relative;margin:16px 0;">' +
                '<button class="copy-btn" style="position:absolute;top:8px;right:8px;padding:4px 12px;font-size:12px;background:#ddd;color:#333;border:1px solid #ccc;border-radius:4px;cursor:pointer;z-index:10;">Copy</button>' +
                '<pre style="background:#f8f8f8;border-radius:8px;padding:16px;padding-top:40px;overflow-x:auto;margin:0;border:1px solid #e0e0e0;"><code id="' + id + '" class="language-' + lang + '" style="font-family:Consolas,Monaco,monospace;font-size:14px;"></code></pre>' +
            '</div>';
        
        var codeEl = document.getElementById(id);
        codeEl.textContent = code;
        Prism.highlightElement(codeEl);
        
        block.querySelector('.copy-btn').onclick = function() {
            var t = document.createElement('textarea');
            t.value = code;
            document.body.appendChild(t);
            t.select();
            document.execCommand('copy');
            document.body.removeChild(t);
            this.textContent = '✓ Copied!';
            setTimeout(() => this.textContent = 'Copy', 2000);
        };
    });
})();
</script>

## Job Analysis

Job analysis inventories all scheduled workloads, their dependencies, and operational characteristics.

### What to Analyze

| Area | Key Questions | Output |
|------|---------------|--------|
| **Scheduled Jobs** | What tasks/jobs exist? Frequency? | Job inventory |
| **Dependencies** | What runs before/after? DAG structure? | Dependency graph |
| **SLAs** | Completion time requirements? | Success criteria |
| **Failure Handling** | Retry logic? Alerting? | Operational requirements |
| **Resource Usage** | Compute consumed per job? | Sizing and cost data |

### Example: Job Inventory Query

<div class="code-block" data-language="sql">
-- REPLACEME: Platform-specific job/task query
SELECT 
    task_name,
    schedule,
    state,
    warehouse,
    predecessor_tasks,
    created_on,
    last_run_time
FROM information_schema.tasks
ORDER BY last_run_time DESC;
</div>

### Example: Job History Analysis

<div class="code-block" data-language="sql">
-- REPLACEME: Platform-specific job history query
SELECT 
    job_name,
    COUNT(*) AS run_count,
    AVG(duration_seconds) AS avg_duration,
    MAX(duration_seconds) AS max_duration,
    SUM(CASE WHEN status = 'FAILED' THEN 1 ELSE 0 END) AS failure_count
FROM job_history
WHERE start_time > DATEADD(day, -30, CURRENT_TIMESTAMP())
GROUP BY job_name
ORDER BY run_count DESC;
</div>

<link href="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/themes/prism.min.css" rel="stylesheet" />
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/prism.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/components/prism-sql.min.js"></script>

<script>
(function() {
    document.querySelectorAll('.code-block').forEach(function(block) {
        var lang = block.getAttribute('data-language') || 'sql';
        var code = block.textContent.trim();
        var id = 'code-' + Math.random().toString(36).substr(2, 9);
        
        block.innerHTML = 
            '<div style="position:relative;margin:16px 0;">' +
                '<button class="copy-btn" style="position:absolute;top:8px;right:8px;padding:4px 12px;font-size:12px;background:#ddd;color:#333;border:1px solid #ccc;border-radius:4px;cursor:pointer;z-index:10;">Copy</button>' +
                '<pre style="background:#f8f8f8;border-radius:8px;padding:16px;padding-top:40px;overflow-x:auto;margin:0;border:1px solid #e0e0e0;"><code id="' + id + '" class="language-' + lang + '" style="font-family:Consolas,Monaco,monospace;font-size:14px;"></code></pre>' +
            '</div>';
        
        var codeEl = document.getElementById(id);
        codeEl.textContent = code;
        Prism.highlightElement(codeEl);
        
        block.querySelector('.copy-btn').onclick = function() {
            var t = document.createElement('textarea');
            t.value = code;
            document.body.appendChild(t);
            t.select();
            document.execCommand('copy');
            document.body.removeChild(t);
            this.textContent = '✓ Copied!';
            setTimeout(() => this.textContent = 'Copy', 2000);
        };
    });
})();
</script>

## Query Analysis

Query analysis examines SQL code complexity, function usage, and identifies conversion requirements.

### What to Analyze

| Area | Key Questions | Output |
|------|---------------|--------|
| **Query Patterns** | SELECT, DML, DDL breakdown? | Workload characterization |
| **Code Complexity** | Stored procedures, UDFs, loops? | Conversion effort |
| **Function Usage** | Platform-specific functions? | Compatibility assessment |
| **Performance** | Long-running queries? Resource hogs? | Optimization candidates |

### Example: Query Pattern Analysis

<div class="code-block" data-language="sql">
-- REPLACEME: Platform-specific query history
SELECT 
    query_type,
    COUNT(*) AS query_count,
    AVG(execution_time_ms) AS avg_time_ms,
    SUM(credits_used) AS total_credits,
    COUNT(DISTINCT user_name) AS unique_users
FROM query_history
WHERE start_time > DATEADD(day, -30, CURRENT_TIMESTAMP())
GROUP BY query_type
ORDER BY total_credits DESC;
</div>

### Example: Stored Procedure Inventory

<div class="code-block" data-language="sql">
-- REPLACEME: Platform-specific procedure query
SELECT 
    procedure_schema,
    procedure_name,
    argument_signature,
    procedure_language,
    LENGTH(procedure_definition) AS code_length
FROM information_schema.procedures
ORDER BY code_length DESC;
</div>

<link href="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/themes/prism.min.css" rel="stylesheet" />
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/prism.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/prism/1.29.0/components/prism-sql.min.js"></script>

<script>
(function() {
    document.querySelectorAll('.code-block').forEach(function(block) {
        var lang = block.getAttribute('data-language') || 'sql';
        var code = block.textContent.trim();
        var id = 'code-' + Math.random().toString(36).substr(2, 9);
        
        block.innerHTML = 
            '<div style="position:relative;margin:16px 0;">' +
                '<button class="copy-btn" style="position:absolute;top:8px;right:8px;padding:4px 12px;font-size:12px;background:#ddd;color:#333;border:1px solid #ccc;border-radius:4px;cursor:pointer;z-index:10;">Copy</button>' +
                '<pre style="background:#f8f8f8;border-radius:8px;padding:16px;padding-top:40px;overflow-x:auto;margin:0;border:1px solid #e0e0e0;"><code id="' + id + '" class="language-' + lang + '" style="font-family:Consolas,Monaco,monospace;font-size:14px;"></code></pre>' +
            '</div>';
        
        var codeEl = document.getElementById(id);
        codeEl.textContent = code;
        Prism.highlightElement(codeEl);
        
        block.querySelector('.copy-btn').onclick = function() {
            var t = document.createElement('textarea');
            t.value = code;
            document.body.appendChild(t);
            t.select();
            document.execCommand('copy');
            document.body.removeChild(t);
            this.textContent = '✓ Copied!';
            setTimeout(() => this.textContent = 'Copy', 2000);
        };
    });
})();
</script>

### BladeBridge Analyzer

For comprehensive code analysis, run BladeBridge Analyzer against your SQL codebase. It generates:

| Output | Description |
|--------|-------------|
| **Code Inventory** | DDLs, views, procedures, functions, tasks |
| **Complexity Scores** | Low, medium, complex, very complex per script |
| **Function Usage** | List of all functions and data types used |
| **Cross-references** | Table popularity and dependencies |

## CI/CD Analysis

CI/CD analysis documents how code and configuration changes flow from development to production.

### What to Analyze

| Area | Key Questions | Output |
|------|---------------|--------|
| **Source Control** | Git repos? Branching strategy? | Code locations |
| **Deployment Pipelines** | Jenkins, GitHub Actions, Azure DevOps? | Pipeline inventory |
| **Environments** | Dev, staging, prod? How provisioned? | Environment map |
| **IaC** | Terraform, Pulumi, CloudFormation? | Infrastructure code review |
| **Testing** | Automated tests? Data validation? | Quality gates |

### What to Capture
- Repository locations and structure
- Pipeline definitions (export YAML/JSON)
- Environment configurations
- Deployment frequency and lead time
- Rollback procedures

### IaC Review Checklist

| IaC Component | Migration Impact |
|---------------|------------------|
| **Resource Definitions** | Map to Databricks equivalents |
| **Variables & Secrets** | Update for new platform |
| **State Files** | Plan for state migration |
| **Modules** | Identify reusable vs rewrite |

## Complexity Scoring Matrix

T-shirt sizing provides a simple framework to categorize workloads and estimate migration effort.

### T-Shirt Sizes

| Size | Effort | Characteristics |
|------|--------|-----------------|
| **XS** | < 1 day | Simple tables, basic views, standard SQL |
| **S** | 1-3 days | Standard ETL, few dependencies, minor syntax changes |
| **M** | 1-2 weeks | Complex queries, multiple dependencies, some rewrites |
| **L** | 2-4 weeks | Stored procedures, custom logic, significant refactoring |
| **XL** | 1+ months | Major redesign, complex integrations, extensive testing |

### Scoring Factors

| Factor | Low (1) | High (5) |
|--------|---------|----------|
| **Code Complexity** | Simple SQL | Stored procedures with loops/cursors |
| **Data Volume** | < 100 GB | > 10 TB |
| **Dependencies** | Standalone | Many upstream/downstream |
| **Platform Features** | Standard SQL | Proprietary functions |
| **Business Criticality** | Dev/test | Revenue-critical |

### Risk Indicators

| Risk Level | Indicators | Approach |
|------------|------------|----------|
| **Low** | Simple SQL, small data, no dependencies | Standard migration |
| **Medium** | Some complexity, moderate data | Extended testing |
| **High** | Stored procs, TB-scale, many dependencies | POC first, detailed planning |

## Summary

### Profiling Deliverables

| Deliverable | Source |
|-------------|--------|
| Control plane inventory | CLIs, StackQL, IaC review |
| Data plane catalog | System catalog queries, Lakebridge |
| Job inventory and dependencies | Task/job queries |
| Code complexity assessment | BladeBridge Analyzer |
| CI/CD documentation | Pipeline and IaC review |
| Complexity scores | Scoring matrix |

### Next Steps

- Run profiling queries and tools against your {SOURCE_PLATFORM} environment
- Document findings in a migration inventory
- Apply complexity scoring to prioritize workloads
- Proceed to [**1.5 - Planning and Road-mapping**]($./1.5 - Planning and Road-mapping) to build your migration roadmap

&copy; 2026 Databricks, Inc. All rights reserved. Apache, Apache Spark, Spark, the Spark Logo, Apache Iceberg, Iceberg, and the Apache Iceberg logo are trademarks of the <a href="https://www.apache.org/" target="_blank">Apache Software Foundation</a>.<br/><br/><a href="https://databricks.com/privacy-policy" target="_blank">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use" target="_blank">Terms of Use</a> | <a href="https://help.databricks.com/" target="_blank">Support</a>
