# GlobalSupply Corp - Migration Assessment Analysis

## 🏢 Business Context
**GlobalSupply Corp** operates a complex supply chain data warehouse built on SQL Server with TPC-H style schemas. This workshop demonstrates how to modernize this legacy system to Databricks using Lakebridge for:

- **AI-powered supply chain optimization**
- **Natural language queries for business users**  
- **Real-time analytics and forecasting**
- **Scalable cloud-native architecture**

## 📊 What This Notebook Covers
- **Migration Complexity Analysis** - Which components require the most effort
- **Dependency Mapping** - Critical interdependencies affecting migration sequencing
- **Risk Assessment** - Potential challenges and mitigation strategies
- **Business Value Analysis** - ROI projections for Databricks migration

---

## 🔧 Prerequisites & Setup

### Step 1: Install Required Python Dependencies
```bash
# Install essential packages for analysis and Excel support
pip install pandas matplotlib seaborn numpy openpyxl
```

**Note:** `openpyxl` is required to read Excel files generated by Lakebridge Analyzer.

### Step 2: Install Lakebridge (Optional - for real assessments)
```bash
# Install Databricks Labs Lakebridge
databricks labs install lakebridge

# Verify installation
databricks labs lakebridge analyze --help
```

### Step 3: Prepare Legacy SQL Files
For real assessments, export your SQL Server workloads to a directory structure like:
```
legacy_sql/
├── analytics/
│   ├── supply_chain_performance.sql
│   ├── inventory_optimization.sql
│   └── customer_profitability.sql
├── reports/
│   ├── financial_summary.sql
│   └── operational_reports.sql
└── etl/
    ├── data_processing.sql
    └── transformations.sql
```

### Step 4: Run Assessment (with sample data or real SQL files)
```bash
# Option 1: Use sample SQL files for demonstration
python 01_assessment_analyzer.py --generate-samples

# Option 2: Assess real SQL files
python 01_assessment_analyzer.py --source-directory /path/to/legacy_sql

# Option 3: Use Lakebridge directly (if installed and configured)
databricks labs lakebridge analyze \
  --source-directory /path/to/legacy_sql \
  --report-file globalsupply_assessment \
  --source-tech mssql
```

## 📋 Understanding Lakebridge Analyzer Output

The Lakebridge Analyzer generates comprehensive insights including:

### 🔍 Analysis Insights
- **Job Complexity Assessment** - Complexity scores (1-10) for each SQL component
- **Comprehensive Job Inventory** - All mappings, transformations, functions cataloged
- **Cross-System Interdependency Mapping** - Shows how components interact
- **Migration Effort Estimates** - Engineering hours required per component

### 📊 Key Outputs
- **Excel Report** with multiple worksheets:
  - Complexity Analysis
  - Dependency Mapping  
  - Function Usage Statistics
  - Migration Estimates
  - Risk Assessment

### 🎯 Business Value
- **Risk Assessment** for migration planning
- **Resource Planning** for modernization project  
- **Sequencing Guidance** to minimize disruption
- **TCO Analysis** for Databricks migration ROI

In [None]:
# Import required libraries for assessment analysis
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# Configure plotting style for professional reports
plt.style.use('default')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 10

print("✅ Libraries imported successfully")
print("📊 Ready for GlobalSupply Corp assessment analysis")

## 📄 Load Assessment Results

The Lakebridge Analyzer generates a comprehensive Excel report. This section loads and processes that data for analysis.

### Expected Excel Worksheets:
- **Summary** - High-level overview
- **Complexity Analysis** - Detailed complexity scores
- **Dependencies** - Component interdependencies
- **Functions** - SQL function usage statistics
- **Migration Estimates** - Effort and timeline projections

In [None]:
def find_assessment_report():
    """
    Find the most recent Lakebridge assessment report
    """
    current_dir = Path('.')
    
    # Look for assessment reports with common naming patterns
    patterns = ['*assessment*.xlsx', '*globalsupply*.xlsx', '*remorph*.xlsx']
    excel_files = []
    
    for pattern in patterns:
        excel_files.extend(list(current_dir.glob(pattern)))
    
    if not excel_files:
        # Look for any Excel files
        excel_files = list(current_dir.glob('*.xlsx'))
    
    if excel_files:
        # Return the most recent file
        latest_file = max(excel_files, key=lambda x: x.stat().st_mtime)
        return latest_file
    else:
        return None

# Attempt to load the assessment report
report_file = find_assessment_report()

if report_file:
    print(f"📄 Found assessment report: {report_file}")
    
    try:
        # Load the Excel file and examine structure
        excel_data = pd.ExcelFile(report_file)
        print(f"📋 Available worksheets: {excel_data.sheet_names}")
        
        # Load key worksheets
        sheets_data = {}
        for sheet_name in excel_data.sheet_names:
            try:
                sheets_data[sheet_name] = pd.read_excel(report_file, sheet_name=sheet_name)
                print(f"✅ Loaded '{sheet_name}': {len(sheets_data[sheet_name])} rows, {len(sheets_data[sheet_name].columns)} columns")
            except Exception as e:
                print(f"⚠️ Could not load sheet '{sheet_name}': {e}")
        
    except Exception as e:
        print(f"❌ Error loading Excel file: {e}")
        sheets_data = {}
        
else:
    print("❌ No assessment report found!")
    print("")
    print("📝 To generate an assessment report:")
    print("   1. Ensure Lakebridge is installed: databricks labs install lakebridge")
    print("   2. Run assessment: databricks labs lakebridge analyze --source-directory /path/to/sql --source-tech mssql")
    print("   3. Place the generated .xlsx file in this directory")
    print("")
    print("🎭 For demonstration purposes, we'll create sample data...")
    sheets_data = {}

## 🎭 Sample Data for Demonstration

If no real assessment report is available, we'll create representative sample data that demonstrates typical findings from a GlobalSupply Corp assessment.

### Sample Scenario:
- **8 SQL files** representing typical supply chain analytics
- **Mixed complexity levels** from simple reports to complex analytics
- **Realistic dependencies** between components
- **Representative migration effort estimates**

In [None]:
# Create realistic sample data if no real assessment report exists
if not sheets_data:
    print("🎭 Creating sample GlobalSupply Corp assessment data for demonstration...")
    
    # Sample complexity analysis representing real-world SQL Server workloads
    complexity_data = pd.DataFrame({
        'File_Name': [
            'supply_chain_performance.sql',
            'inventory_optimization.sql', 
            'customer_profitability.sql',
            'supplier_risk_assessment.sql',
            'dynamic_reporting.sql',
            'window_functions_analysis.sql',
            'financial_summary.sql',
            'order_processing.sql'
        ],
        'Lines_of_Code': [145, 198, 167, 223, 89, 134, 67, 89],
        'Complexity_Score': [8.5, 9.2, 7.8, 9.8, 6.5, 7.2, 4.5, 5.1],
        'Functions_Used': [12, 18, 14, 22, 8, 16, 6, 9],
        'Table_References': [8, 6, 7, 9, 5, 4, 3, 4],
        'Migration_Hours': [16, 24, 18, 32, 8, 12, 4, 6],
        'Risk_Level': ['Medium', 'High', 'Medium', 'High', 'Low', 'Medium', 'Low', 'Low'],
        'Category': ['Analytics', 'Analytics', 'Analytics', 'Analytics', 'Reporting', 'Analytics', 'Reporting', 'OLTP'],
        'SQL_Features': [
            'CTEs, Window Functions, Complex Joins',
            'Recursive CTEs, PIVOT, Advanced Analytics', 
            'PIVOT, Window Functions, String Aggregation',
            'Recursive CTEs, Dynamic SQL, Risk Scoring',
            'Dynamic SQL, Conditional Logic',
            'Advanced Window Functions, LAG/LEAD',
            'Basic Aggregation, Simple Joins',
            'CRUD Operations, Transactions'
        ]
    })
    
    # Sample dependency mapping
    dependency_data = pd.DataFrame({
        'Source_Object': [
            'customer_profitability.sql', 
            'supplier_risk_assessment.sql', 
            'inventory_optimization.sql',
            'supply_chain_performance.sql',
            'window_functions_analysis.sql'
        ],
        'Target_Object': [
            'financial_summary.sql', 
            'supply_chain_performance.sql', 
            'order_processing.sql',
            'dynamic_reporting.sql',
            'customer_profitability.sql'
        ],
        'Dependency_Type': ['View', 'Procedure', 'Table', 'View', 'Function'],
        'Criticality': ['High', 'High', 'Medium', 'Medium', 'Low']
    })
    
    # Sample function usage statistics
    function_data = pd.DataFrame({
        'Function_Name': [
            'ROW_NUMBER', 'RANK', 'LAG', 'LEAD', 'SUM', 'AVG', 'COUNT', 'DATEDIFF', 
            'DATEADD', 'STRING_AGG', 'PIVOT', 'CASE', 'CTE', 'RECURSIVE_CTE'
        ],
        'Usage_Count': [15, 12, 8, 6, 25, 20, 30, 18, 14, 5, 3, 22, 18, 2],
        'Complexity_Impact': [3, 3, 4, 4, 1, 1, 1, 2, 2, 4, 5, 2, 3, 5],
        'Databricks_Compatibility': [
            'Direct', 'Direct', 'Direct', 'Direct', 'Direct', 'Direct', 'Direct', 'Modified',
            'Modified', 'Modified', 'Modified', 'Direct', 'Direct', 'Complex'
        ]
    })
    
    sheets_data = {
        'Complexity_Analysis': complexity_data,
        'Dependencies': dependency_data,
        'Function_Usage': function_data
    }
    
    print("✅ Sample data created with realistic GlobalSupply Corp scenarios")
    print(f"📊 Generated {len(complexity_data)} SQL components for analysis")
    print(f"🔗 Created {len(dependency_data)} dependency relationships")
    print(f"🔧 Analyzed {len(function_data)} SQL functions")

# Display basic statistics about the loaded data
print("\n" + "="*70)
print("📊 ASSESSMENT DATA SUMMARY")
print("="*70)

for sheet_name, df in sheets_data.items():
    print(f"📋 {sheet_name}: {len(df)} rows, {len(df.columns)} columns")
    if len(df.columns) > 0:
        print(f"   Columns: {', '.join(df.columns[:5])}{'...' if len(df.columns) > 5 else ''}")

print("="*70)

## 🎯 Assessment Overview Dashboard

This section provides a high-level overview of the migration assessment, showing key metrics that executives and project managers need to understand the scope and complexity of the modernization effort.

In [None]:
# Generate comprehensive assessment overview
if 'Complexity_Analysis' in sheets_data:
    df = sheets_data['Complexity_Analysis']
    
    print("\n" + "="*80)
    print("📊 GLOBALSUPPLY CORP - MIGRATION ASSESSMENT OVERVIEW")
    print("="*80)
    
    # Key metrics
    total_files = len(df)
    total_loc = df['Lines_of_Code'].sum()
    total_hours = df['Migration_Hours'].sum()
    avg_complexity = df['Complexity_Score'].mean()
    
    print(f"📁 Total SQL Components: {total_files}")
    print(f"📏 Total Lines of Code: {total_loc:,}")
    print(f"⏱️  Estimated Migration Hours: {total_hours}")
    print(f"📈 Average Complexity Score: {avg_complexity:.1f}/10")
    print(f"💰 Estimated Cost (@$150/hour): ${total_hours * 150:,}")
    
    # Risk distribution analysis
    risk_distribution = df['Risk_Level'].value_counts()
    print(f"\n🚦 RISK DISTRIBUTION:")
    for risk, count in risk_distribution.items():
        percentage = (count/len(df)*100)
        risk_emoji = {'Low': '🟢', 'Medium': '🟡', 'High': '🔴'}
        print(f"   {risk_emoji.get(risk, '⚪')} {risk}: {count} files ({percentage:.1f}%)")
    
    # Category distribution
    category_distribution = df['Category'].value_counts()
    print(f"\n📊 WORKLOAD CATEGORIES:")
    for category, count in category_distribution.items():
        percentage = (count/len(df)*100)
        print(f"   📈 {category}: {count} files ({percentage:.1f}%)")
    
    # Timeline estimates
    weeks_single = total_hours / 40
    weeks_team = total_hours / 160  # 4-person team
    
    print(f"\n📅 TIMELINE ESTIMATES:")
    print(f"   👤 Single Developer: {weeks_single:.1f} weeks")
    print(f"   👥 4-Person Team: {weeks_team:.1f} weeks")
    
    print("\n💡 KEY INSIGHTS:")
    high_risk_count = len(df[df['Risk_Level'] == 'High'])
    complex_count = len(df[df['Complexity_Score'] > 8])
    
    if high_risk_count > 0:
        print(f"   ⚠️  {high_risk_count} high-risk components need expert attention")
    if complex_count > 0:
        print(f"   🧠 {complex_count} highly complex components (>8/10 complexity)")
    print(f"   🎯 Recommended phased approach in 3 migration waves")
    print(f"   📈 Expected 3-5x performance improvement with Databricks")
    
    print("="*80)
else:
    print("⚠️ Complexity analysis data not available")

## 📈 Complexity Analysis Visualizations

These visualizations help identify which components will require the most attention during migration and provide insights into the overall complexity distribution of the SQL codebase.

In [None]:
if 'Complexity_Analysis' in sheets_data:
    df = sheets_data['Complexity_Analysis']
    
    # Create comprehensive complexity dashboard
    fig, axes = plt.subplots(2, 2, figsize=(16, 12))
    fig.suptitle('GlobalSupply Corp - Migration Complexity Analysis Dashboard', 
                 fontsize=16, fontweight='bold', y=0.98)
    
    # 1. Complexity Score Distribution
    axes[0, 0].hist(df['Complexity_Score'], bins=10, alpha=0.7, color='skyblue', 
                    edgecolor='black', linewidth=1.2)
    axes[0, 0].set_title('📊 Complexity Score Distribution', fontweight='bold')
    axes[0, 0].set_xlabel('Complexity Score (1-10 scale)')
    axes[0, 0].set_ylabel('Number of SQL Components')
    
    # Add mean line and statistics
    mean_complexity = df['Complexity_Score'].mean()
    axes[0, 0].axvline(mean_complexity, color='red', linestyle='--', linewidth=2,
                       label=f'Mean: {mean_complexity:.1f}')
    axes[0, 0].legend()
    
    # Add complexity thresholds
    axes[0, 0].axvline(6, color='orange', linestyle=':', alpha=0.7, label='Medium Risk')
    axes[0, 0].axvline(8, color='red', linestyle=':', alpha=0.7, label='High Risk')
    
    # 2. Risk Level Distribution (Pie Chart)
    risk_counts = df['Risk_Level'].value_counts()
    colors = {'Low': '#2ecc71', 'Medium': '#f39c12', 'High': '#e74c3c'}
    risk_colors = [colors[risk] for risk in risk_counts.index]
    
    wedges, texts, autotexts = axes[0, 1].pie(risk_counts.values, labels=risk_counts.index, 
                                              autopct='%1.1f%%', colors=risk_colors, 
                                              startangle=90, explode=(0.05, 0.05, 0.05))
    axes[0, 1].set_title('🚦 Migration Risk Distribution', fontweight='bold')
    
    # Enhance pie chart text
    for autotext in autotexts:
        autotext.set_color('white')
        autotext.set_fontweight('bold')
    
    # 3. Effort vs Complexity Scatter Plot
    scatter = axes[1, 0].scatter(df['Complexity_Score'], df['Migration_Hours'], 
                                c=df['Lines_of_Code'], cmap='viridis', alpha=0.8, 
                                s=150, edgecolors='black', linewidth=0.5)
    
    axes[1, 0].set_xlabel('Complexity Score')
    axes[1, 0].set_ylabel('Migration Hours')
    axes[1, 0].set_title('⚡ Effort vs Complexity (bubble size = Lines of Code)', fontweight='bold')
    
    # Add colorbar
    cbar = plt.colorbar(scatter, ax=axes[1, 0])
    cbar.set_label('Lines of Code', rotation=270, labelpad=20)
    
    # Add trend line
    z = np.polyfit(df['Complexity_Score'], df['Migration_Hours'], 1)
    p = np.poly1d(z)
    axes[1, 0].plot(df['Complexity_Score'], p(df['Complexity_Score']), "r--", alpha=0.7)
    
    # 4. Category-wise Migration Hours (Horizontal Bar Chart)
    category_hours = df.groupby('Category')['Migration_Hours'].sum().sort_values(ascending=True)
    bars = axes[1, 1].barh(category_hours.index, category_hours.values, 
                           color=['#3498db', '#e67e22', '#9b59b6'], alpha=0.8)
    
    axes[1, 1].set_xlabel('Total Migration Hours')
    axes[1, 1].set_title('📊 Migration Effort by Category', fontweight='bold')
    
    # Add value labels on bars
    for i, (bar, value) in enumerate(zip(bars, category_hours.values)):
        axes[1, 1].text(value + 0.5, i, f'{int(value)}h', 
                         va='center', fontweight='bold')
    
    plt.tight_layout()
    plt.subplots_adjust(top=0.93)
    plt.show()
    
    # Additional insights table
    print("\n📋 DETAILED COMPLEXITY BREAKDOWN:")
    print("-" * 80)
    
    # Show top 5 most complex components
    top_complex = df.nlargest(5, 'Complexity_Score')[['File_Name', 'Complexity_Score', 
                                                       'Migration_Hours', 'SQL_Features']]
    
    for _, row in top_complex.iterrows():
        print(f"📄 {row['File_Name']}:")
        print(f"   🎯 Complexity: {row['Complexity_Score']}/10")
        print(f"   ⏱️  Effort: {row['Migration_Hours']} hours")
        print(f"   🔧 Features: {row['SQL_Features']}")
        print()
    
else:
    print("⚠️ Complexity analysis data not available for visualization")

## 🔗 Dependency Analysis

Understanding dependencies is crucial for migration sequencing. This analysis shows which components depend on others and helps plan the migration order to minimize risk and ensure system stability throughout the process.

### Why Dependencies Matter:
- **Migration Sequencing** - Dependencies must be migrated before dependent components
- **Risk Management** - High-criticality dependencies require extra attention
- **Testing Strategy** - Dependent components need integrated testing
- **Rollback Planning** - Understanding dependencies helps plan safe rollback procedures

In [None]:
if 'Dependencies' in sheets_data:
    dep_df = sheets_data['Dependencies']
    
    print("🔗 DEPENDENCY ANALYSIS FOR MIGRATION PLANNING")
    print("="*70)
    
    if len(dep_df) > 0:
        print(f"📊 Total Dependencies Identified: {len(dep_df)}")
        
        # Analyze dependency criticality
        if 'Criticality' in dep_df.columns:
            criticality_counts = dep_df['Criticality'].value_counts()
            print("\n🚦 Dependency Criticality Breakdown:")
            for crit, count in criticality_counts.items():
                crit_emoji = {'High': '🔴', 'Medium': '🟡', 'Low': '🟢'}
                percentage = (count / len(dep_df) * 100)
                print(f"   {crit_emoji.get(crit, '⚪')} {crit}: {count} dependencies ({percentage:.1f}%)")
        
        # Analyze dependency types
        if 'Dependency_Type' in dep_df.columns:
            type_counts = dep_df['Dependency_Type'].value_counts()
            print("\n📋 Dependency Types:")
            for dep_type, count in type_counts.items():
                print(f"   📊 {dep_type}: {count} dependencies")
        
        print("\n🔗 Critical Dependency Chains:")
        print("-" * 50)
        
        # Show high-criticality dependencies first
        high_crit_deps = dep_df[dep_df['Criticality'] == 'High'] if 'Criticality' in dep_df.columns else dep_df.head(5)
        
        for _, row in high_crit_deps.iterrows():
            source = row['Source_Object'].replace('.sql', '')
            target = row['Target_Object'].replace('.sql', '')
            dep_type = row.get('Dependency_Type', 'Unknown')
            criticality = row.get('Criticality', 'Unknown')
            
            print(f"   {source} ➜ {target}")
            print(f"     Type: {dep_type} | Criticality: {criticality}")
            print()
        
        # Migration sequencing recommendations
        print("📋 MIGRATION SEQUENCING RECOMMENDATIONS:")
        print("-" * 50)
        
        # Find components with no dependencies (good starting points)
        all_sources = set(dep_df['Source_Object'])
        all_targets = set(dep_df['Target_Object'])
        independent_components = all_sources - all_targets
        
        if independent_components:
            print("🏁 Start with these independent components:")
            for comp in list(independent_components)[:3]:
                print(f"   ✅ {comp.replace('.sql', '')}")
        
        # Find components that many others depend on
        dependency_counts = dep_df['Source_Object'].value_counts()
        if len(dependency_counts) > 0:
            print("\n🎯 High-priority components (many dependencies):")
            for comp, count in dependency_counts.head(3).items():
                print(f"   🔴 {comp.replace('.sql', '')} ({count} dependent components)")
    
    # Create dependency visualizations
    if len(dep_df) > 0:
        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))
        fig.suptitle('GlobalSupply Corp - Dependency Analysis', fontsize=14, fontweight='bold')
        
        # 1. Criticality distribution
        if 'Criticality' in dep_df.columns:
            criticality_counts = dep_df['Criticality'].value_counts()
            colors = {'High': '#e74c3c', 'Medium': '#f39c12', 'Low': '#2ecc71'}
            crit_colors = [colors.get(crit, '#95a5a6') for crit in criticality_counts.index]
            
            wedges, texts, autotexts = ax1.pie(criticality_counts.values, 
                                               labels=criticality_counts.index,
                                               autopct='%1.1f%%', colors=crit_colors,
                                               startangle=90, explode=(0.05, 0.05, 0.05))
            ax1.set_title('🚦 Dependency Criticality', fontweight='bold')
            
            for autotext in autotexts:
                autotext.set_color('white')
                autotext.set_fontweight('bold')
        
        # 2. Dependency type distribution
        if 'Dependency_Type' in dep_df.columns:
            type_counts = dep_df['Dependency_Type'].value_counts()
            bars = ax2.bar(type_counts.index, type_counts.values, 
                          color=['#3498db', '#e67e22', '#9b59b6', '#1abc9c'][:len(type_counts)])
            
            ax2.set_title('📊 Dependencies by Type', fontweight='bold')
            ax2.set_xlabel('Dependency Type')
            ax2.set_ylabel('Count')
            ax2.tick_params(axis='x', rotation=45)
            
            # Add value labels on bars
            for bar, value in zip(bars, type_counts.values):
                ax2.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.1,
                         str(value), ha='center', va='bottom', fontweight='bold')
        
        plt.tight_layout()
        plt.show()
    
else:
    print("⚠️ Dependency analysis data not available")
    print("💡 In a real assessment, Lakebridge would identify:")
    print("   • Table-to-table dependencies")
    print("   • View-to-table relationships")
    print("   • Stored procedure call chains")
    print("   • Cross-database dependencies")

## 🔧 SQL Function Analysis

This analysis examines SQL functions and features used in the legacy code to understand:
- **Databricks Compatibility** - Which functions translate directly vs need modification
- **Complexity Drivers** - Functions that contribute most to migration complexity
- **Focus Areas** - Where development effort should be concentrated

In [None]:
if 'Function_Usage' in sheets_data:
    func_df = sheets_data['Function_Usage']
    
    print("🔧 SQL FUNCTION COMPATIBILITY ANALYSIS")
    print("="*70)
    
    # Overall statistics
    total_functions = func_df['Usage_Count'].sum()
    unique_functions = len(func_df)
    
    print(f"📊 Total Function Usages: {total_functions}")
    print(f"🔧 Unique Functions: {unique_functions}")
    
    # Compatibility breakdown
    if 'Databricks_Compatibility' in func_df.columns:
        compat_counts = func_df['Databricks_Compatibility'].value_counts()
        print("\n🎯 DATABRICKS COMPATIBILITY BREAKDOWN:")
        
        compat_emojis = {
            'Direct': '✅ Direct translation',
            'Modified': '🔄 Requires modification', 
            'Complex': '⚠️ Complex migration',
            'Manual': '🛠️ Manual rewrite needed'
        }
        
        for compat, count in compat_counts.items():
            percentage = (count / len(func_df) * 100)
            emoji_desc = compat_emojis.get(compat, f'🔍 {compat}')
            print(f"   {emoji_desc}: {count} functions ({percentage:.1f}%)")
    
    # Top complexity drivers
    if 'Complexity_Impact' in func_df.columns:
        print("\n🎯 TOP COMPLEXITY DRIVERS:")
        complexity_drivers = func_df.nlargest(5, 'Complexity_Impact')
        
        for _, row in complexity_drivers.iterrows():
            function = row['Function_Name']
            usage = row['Usage_Count']
            complexity = row['Complexity_Impact']
            compat = row.get('Databricks_Compatibility', 'Unknown')
            
            print(f"   🔧 {function}:")
            print(f"      Used {usage} times | Complexity: {complexity}/5 | Compatibility: {compat}")
    
    # Most used functions
    print("\n📈 MOST FREQUENTLY USED FUNCTIONS:")
    top_used = func_df.nlargest(5, 'Usage_Count')
    
    for _, row in top_used.iterrows():
        function = row['Function_Name']
        usage = row['Usage_Count']
        compat = row.get('Databricks_Compatibility', 'Unknown')
        
        status_emoji = {'Direct': '✅', 'Modified': '🔄', 'Complex': '⚠️'}.get(compat, '🔍')
        print(f"   {status_emoji} {function}: {usage} usages ({compat})")
    
    # Create function analysis visualization
    fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(16, 10))
    fig.suptitle('SQL Function Usage & Compatibility Analysis', fontsize=14, fontweight='bold')
    
    # 1. Compatibility distribution
    if 'Databricks_Compatibility' in func_df.columns:
        compat_counts = func_df['Databricks_Compatibility'].value_counts()
        colors = {'Direct': '#2ecc71', 'Modified': '#f39c12', 'Complex': '#e74c3c', 'Manual': '#8e44ad'}
        compat_colors = [colors.get(comp, '#95a5a6') for comp in compat_counts.index]
        
        ax1.pie(compat_counts.values, labels=compat_counts.index, autopct='%1.1f%%',
                colors=compat_colors, startangle=90)
        ax1.set_title('🎯 Compatibility Distribution')
    
    # 2. Top used functions
    top_used = func_df.nlargest(8, 'Usage_Count')
    bars = ax2.barh(top_used['Function_Name'], top_used['Usage_Count'], color='skyblue')
    ax2.set_title('📊 Most Used Functions')
    ax2.set_xlabel('Usage Count')
    
    # Add value labels
    for bar, value in zip(bars, top_used['Usage_Count']):
        ax2.text(bar.get_width() + 0.3, bar.get_y() + bar.get_height()/2,
                 str(value), va='center', fontweight='bold')
    
    # 3. Complexity vs Usage scatter
    if 'Complexity_Impact' in func_df.columns:
        scatter = ax3.scatter(func_df['Usage_Count'], func_df['Complexity_Impact'],
                             alpha=0.7, s=100, color='coral')
        ax3.set_xlabel('Usage Count')
        ax3.set_ylabel('Complexity Impact (1-5)')
        ax3.set_title('⚡ Usage vs Complexity')
        ax3.grid(True, alpha=0.3)
    
    # 4. Function categories (if available)
    ax4.text(0.1, 0.8, "🎯 KEY MIGRATION INSIGHTS:", transform=ax4.transAxes, 
             fontsize=12, fontweight='bold')
    
    insights = [
        "• Focus on high-usage complex functions first",
        "• Test modified functions thoroughly", 
        "• Consider performance implications",
        "• Plan for user training on new syntax"
    ]
    
    for i, insight in enumerate(insights):
        ax4.text(0.1, 0.6 - i*0.1, insight, transform=ax4.transAxes, fontsize=10)
    
    ax4.axis('off')
    
    plt.tight_layout()
    plt.show()
    
else:
    print("⚠️ Function usage data not available")
    print("💡 In a real assessment, this would show:")
    print("   • SQL Server specific functions used")
    print("   • Databricks compatibility status")
    print("   • Migration complexity by function type")

## 📋 Migration Planning Dashboard

This section provides actionable insights for planning the migration, including:
- **Migration Wave Strategy** - Phased approach based on complexity and risk
- **Resource Planning** - Team size and timeline estimates
- **Risk Mitigation** - Specific recommendations for high-risk components
- **Cost-Benefit Analysis** - Investment vs expected returns

In [None]:
if 'Complexity_Analysis' in sheets_data:
    df = sheets_data['Complexity_Analysis']
    
    print("📋 GLOBALSUPPLY CORP - MIGRATION PLANNING STRATEGY")
    print("="*80)
    
    # Define migration waves based on complexity and risk
    def assign_migration_wave(row):
        """
        Assign components to migration waves based on complexity and risk
        Wave 1: Low risk, low-medium complexity (Quick Wins)
        Wave 2: Medium risk or higher complexity (Standard Migration)
        Wave 3: High risk or very high complexity (Complex Components)
        """
        if row['Risk_Level'] == 'Low' and row['Complexity_Score'] < 6:
            return 'Wave 1 - Quick Wins'
        elif row['Risk_Level'] == 'Medium' or (row['Risk_Level'] == 'Low' and row['Complexity_Score'] >= 6):
            return 'Wave 2 - Standard Migration'
        else:
            return 'Wave 3 - Complex Components'
    
    df['Migration_Wave'] = df.apply(assign_migration_wave, axis=1)
    
    # Analyze migration waves
    wave_analysis = df.groupby('Migration_Wave').agg({
        'File_Name': 'count',
        'Migration_Hours': 'sum', 
        'Complexity_Score': 'mean',
        'Lines_of_Code': 'sum'
    }).round(1)
    
    wave_analysis.columns = ['File_Count', 'Total_Hours', 'Avg_Complexity', 'Total_LOC']
    wave_analysis = wave_analysis.reindex([
        'Wave 1 - Quick Wins', 
        'Wave 2 - Standard Migration', 
        'Wave 3 - Complex Components'
    ])
    
    print("🌊 RECOMMENDED MIGRATION WAVE STRATEGY:")
    print("-" * 80)
    
    wave_descriptions = {
        'Wave 1 - Quick Wins': {
            'emoji': '🟢',
            'description': 'Low risk, straightforward migrations to build momentum',
            'timeline': '2-4 weeks',
            'team': '1-2 developers'
        },
        'Wave 2 - Standard Migration': {
            'emoji': '🟡', 
            'description': 'Standard complexity migrations with moderate risk',
            'timeline': '4-8 weeks',
            'team': '2-3 developers'
        },
        'Wave 3 - Complex Components': {
            'emoji': '🔴',
            'description': 'High complexity/risk components requiring expert attention',
            'timeline': '6-12 weeks',
            'team': '3-4 senior developers'
        }
    }
    
    for wave, row in wave_analysis.iterrows():
        if pd.isna(row['File_Count']):
            continue
            
        wave_info = wave_descriptions.get(wave, {'emoji': '⚪', 'description': '', 'timeline': '', 'team': ''})
        
        print(f"{wave_info['emoji']} {wave}:")
        print(f"   📄 Components: {int(row['File_Count'])}")
        print(f"   ⏱️  Total Hours: {int(row['Total_Hours'])}")
        print(f"   📊 Avg Complexity: {row['Avg_Complexity']}/10")
        print(f"   📏 Lines of Code: {int(row['Total_LOC']):,}")
        print(f"   📅 Timeline: {wave_info['timeline']}")
        print(f"   👥 Team Size: {wave_info['team']}")
        print(f"   💡 Strategy: {wave_info['description']}")
        print()
    
    # High-risk component analysis
    high_risk_files = df[df['Risk_Level'] == 'High']
    
    if len(high_risk_files) > 0:
        print("⚠️  HIGH-RISK COMPONENTS - SPECIAL ATTENTION REQUIRED:")
        print("-" * 80)
        
        for _, file_data in high_risk_files.iterrows():
            print(f"🔴 {file_data['File_Name']}:")
            print(f"   🎯 Complexity: {file_data['Complexity_Score']}/10")
            print(f"   ⏱️  Effort: {file_data['Migration_Hours']} hours")
            print(f"   🔧 Features: {file_data.get('SQL_Features', 'Advanced SQL patterns')}")
            
            # Specific recommendations based on complexity
            if file_data['Complexity_Score'] > 9:
                print(f"   💡 Recommendation: Assign senior architect, plan proof-of-concept")
            else:
                print(f"   💡 Recommendation: Assign senior developer, thorough testing required")
            print()
    
    # Cost-benefit analysis
    total_hours = df['Migration_Hours'].sum()
    avg_hourly_rate = 150
    total_cost = total_hours * avg_hourly_rate
    
    print("💰 COST-BENEFIT ANALYSIS:")
    print("-" * 80)
    print(f"📊 Total Migration Effort: {total_hours} hours")
    print(f"💵 Estimated Cost: ${total_cost:,} (@ ${avg_hourly_rate}/hour)")
    print(f"📅 Sequential Timeline: {total_hours/40:.1f} weeks (1 developer)")
    print(f"📅 Parallel Timeline: {total_hours/160:.1f} weeks (4 developers)")
    print(f"📅 Recommended Timeline: {total_hours/120:.1f} weeks (3 developers + coordination)")
    
    # Expected benefits
    print("\n📈 EXPECTED BUSINESS BENEFITS:")
    print("-" * 50)
    
    benefits = [
        ("Query Performance", "3-5x improvement", "Faster analytics, better user experience"),
        ("Analytics Capability", "Advanced ML/AI", "Predictive supply chain optimization"),
        ("Infrastructure Cost", "20-30% reduction", "Cloud-native scaling and optimization"),
        ("Time-to-Insight", "10x faster", "Natural language queries with Genie"),
        ("Scalability", "Unlimited scale", "Handle peak loads without performance issues"),
        ("Innovation Speed", "2x faster", "Rapid prototyping of new analytics")
    ]
    
    for benefit, improvement, description in benefits:
        print(f"💡 {benefit}: {improvement}")
        print(f"   {description}")
        print()
    
    # ROI calculation
    annual_savings = 200000  # Estimated annual savings
    payback_months = (total_cost / (annual_savings / 12))
    
    print(f"💰 ROI PROJECTION:")
    print(f"   Annual Savings Estimate: ${annual_savings:,}")
    print(f"   Payback Period: {payback_months:.1f} months")
    print(f"   3-Year ROI: {((annual_savings * 3 - total_cost) / total_cost * 100):.0f}%")
    
else:
    print("⚠️ Migration planning data not available")

## 📊 Migration Wave Visualization

Visual representation of the recommended migration strategy showing how components are distributed across waves and the associated effort and risk levels.

In [None]:
if 'Complexity_Analysis' in sheets_data and 'Migration_Wave' in df.columns:
    # Create comprehensive migration planning visualization
    fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(16, 12))
    fig.suptitle('GlobalSupply Corp - Migration Wave Strategy Dashboard', 
                 fontsize=16, fontweight='bold')
    
    # 1. Migration wave distribution
    wave_counts = df['Migration_Wave'].value_counts()
    wave_order = ['Wave 1 - Quick Wins', 'Wave 2 - Standard Migration', 'Wave 3 - Complex Components']
    wave_counts = wave_counts.reindex(wave_order).fillna(0)
    
    colors = ['#2ecc71', '#f39c12', '#e74c3c']
    bars1 = ax1.bar(range(len(wave_counts)), wave_counts.values, color=colors)
    ax1.set_title('🌊 Components by Migration Wave')
    ax1.set_xlabel('Migration Wave')
    ax1.set_ylabel('Number of Components')
    ax1.set_xticks(range(len(wave_counts)))
    ax1.set_xticklabels(['Wave 1\nQuick Wins', 'Wave 2\nStandard', 'Wave 3\nComplex'], rotation=0)
    
    # Add value labels on bars
    for bar, value in zip(bars1, wave_counts.values):
        if value > 0:
            ax1.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.1,
                     str(int(value)), ha='center', va='bottom', fontweight='bold')
    
    # 2. Effort distribution by wave
    wave_hours = df.groupby('Migration_Wave')['Migration_Hours'].sum().reindex(wave_order).fillna(0)
    bars2 = ax2.bar(range(len(wave_hours)), wave_hours.values, color=colors)
    ax2.set_title('⏱️ Effort Distribution by Wave')
    ax2.set_xlabel('Migration Wave')
    ax2.set_ylabel('Migration Hours')
    ax2.set_xticks(range(len(wave_hours)))
    ax2.set_xticklabels(['Wave 1\nQuick Wins', 'Wave 2\nStandard', 'Wave 3\nComplex'], rotation=0)
    
    # Add value labels
    for bar, value in zip(bars2, wave_hours.values):
        if value > 0:
            ax2.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.5,
                     f'{int(value)}h', ha='center', va='bottom', fontweight='bold')
    
    # 3. Risk vs Effort scatter plot by wave
    wave_colors = {'Wave 1 - Quick Wins': '#2ecc71', 
                   'Wave 2 - Standard Migration': '#f39c12',
                   'Wave 3 - Complex Components': '#e74c3c'}
    
    for wave in df['Migration_Wave'].unique():
        wave_data = df[df['Migration_Wave'] == wave]
        ax3.scatter(wave_data['Complexity_Score'], wave_data['Migration_Hours'],
                   label=wave.replace(' - ', '\n'), color=wave_colors.get(wave, '#95a5a6'),
                   alpha=0.7, s=100, edgecolors='black', linewidth=0.5)
    
    ax3.set_xlabel('Complexity Score')
    ax3.set_ylabel('Migration Hours')
    ax3.set_title('📊 Risk vs Effort by Wave')
    ax3.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
    ax3.grid(True, alpha=0.3)
    
    # 4. Timeline visualization
    # Create a Gantt-like chart showing wave timelines
    wave_timeline = {
        'Wave 1 - Quick Wins': {'start': 0, 'duration': 4, 'color': '#2ecc71'},
        'Wave 2 - Standard Migration': {'start': 2, 'duration': 8, 'color': '#f39c12'},
        'Wave 3 - Complex Components': {'start': 6, 'duration': 12, 'color': '#e74c3c'}
    }
    
    y_pos = 0
    for wave, timeline in wave_timeline.items():
        ax4.barh(y_pos, timeline['duration'], left=timeline['start'], 
                color=timeline['color'], alpha=0.7, height=0.6)
        
        # Add wave labels
        ax4.text(timeline['start'] + timeline['duration']/2, y_pos,
                wave.split(' - ')[0], ha='center', va='center', 
                fontweight='bold', color='white')
        y_pos += 1
    
    ax4.set_xlim(0, 18)
    ax4.set_ylim(-0.5, 2.5)
    ax4.set_xlabel('Timeline (Weeks)')
    ax4.set_title('📅 Recommended Migration Timeline')
    ax4.set_yticks(range(3))
    ax4.set_yticklabels(['Wave 1', 'Wave 2', 'Wave 3'])
    ax4.grid(True, alpha=0.3, axis='x')
    
    # Add milestone markers
    milestones = [(4, 'Wave 1 Complete'), (10, 'Wave 2 Complete'), (18, 'Migration Complete')]
    for week, milestone in milestones:
        ax4.axvline(x=week, color='red', linestyle='--', alpha=0.5)
        ax4.text(week, 2.2, milestone, rotation=45, ha='right', va='bottom', fontsize=8)
    
    plt.tight_layout()
    plt.show()
    
    print("\n📋 MIGRATION WAVE SUMMARY:")
    print("="*60)
    
    total_components = len(df)
    for wave in wave_order:
        wave_data = df[df['Migration_Wave'] == wave]
        if len(wave_data) > 0:
            count = len(wave_data)
            percentage = (count / total_components * 100)
            avg_complexity = wave_data['Complexity_Score'].mean()
            total_effort = wave_data['Migration_Hours'].sum()
            
            print(f"{wave}:")
            print(f"  📊 {count} components ({percentage:.1f}% of total)")
            print(f"  📈 Average complexity: {avg_complexity:.1f}/10")
            print(f"  ⏱️  Total effort: {total_effort} hours")
            print()
            
else:
    print("⚠️ Migration wave data not available for visualization")

## 📝 Executive Summary & Next Steps

Based on the comprehensive assessment analysis, here are the key findings and actionable next steps for GlobalSupply Corp's data modernization journey to Databricks.

In [None]:
# Generate executive summary and actionable recommendations
if 'Complexity_Analysis' in sheets_data:
    df = sheets_data['Complexity_Analysis']
    
    # Calculate key metrics for summary
    total_files = len(df)
    total_hours = df['Migration_Hours'].sum()
    total_cost = total_hours * 150
    high_risk_count = len(df[df['Risk_Level'] == 'High'])
    avg_complexity = df['Complexity_Score'].mean()
    
    print("""
═══════════════════════════════════════════════════════════════════════════════
📊 GLOBALSUPPLY CORP - EXECUTIVE SUMMARY & STRATEGIC RECOMMENDATIONS
═══════════════════════════════════════════════════════════════════════════════

🎯 ASSESSMENT FINDINGS:
""")

    # Key findings with specific numbers
    findings = [
        f"• {total_files} SQL components analyzed with {df['Lines_of_Code'].sum():,} lines of code",
        f"• Average complexity score: {avg_complexity:.1f}/10 (moderate-to-high complexity)",
        f"• {high_risk_count} high-risk components requiring expert attention",
        f"• Estimated migration effort: {total_hours} hours (${total_cost:,})",
        f"• Strong dependency relationships requiring careful sequencing"
    ]
    
    for finding in findings:
        print(finding)

    print("""
📈 BUSINESS IMPACT & ROI:
• Query Performance: 3-5x improvement for analytical workloads
• Natural Language Queries: Enable business users with Databricks Genie
• ML/AI Capabilities: Advanced supply chain optimization and forecasting
• Infrastructure Costs: 20-30% reduction through cloud-native optimization
• Time-to-Insight: 10x faster analytics development and deployment
• Scalability: Unlimited scale for peak demand scenarios

💰 FINANCIAL PROJECTIONS:
• Investment Required: ${:,}
• Expected Annual Savings: $200,000+
• Payback Period: {:.1f} months
• 3-Year ROI: {}%""".format(
        total_cost, 
        (total_cost / (200000 / 12)), 
        int(((200000 * 3 - total_cost) / total_cost * 100))
    ))

    print("""
🚀 STRATEGIC RECOMMENDATIONS:

1. ✅ IMMEDIATE ACTIONS (Next 2 weeks):
   → Secure executive sponsorship and budget approval
   → Assemble migration team with SQL Server + Databricks expertise
   → Set up Databricks workspace and development environment
   → Begin with Module 2: Schema Migration & Transpilation workshop

2. 📋 SHORT-TERM EXECUTION (4-6 weeks):
   → Execute Wave 1 migrations (low complexity, quick wins)
   → Establish CI/CD pipelines for automated testing
   → Begin user training on Databricks platform
   → Proceed to Module 3: Data Reconciliation workshop

3. 🎯 MEDIUM-TERM DELIVERY (2-3 months):
   → Complete Wave 2 & 3 migrations with thorough testing
   → Implement advanced analytics and ML models
   → Deploy natural language query capabilities
   → Complete Module 4: Modern Analytics & ML workshop

4. 🌟 LONG-TERM OPTIMIZATION (3-6 months):
   → Optimize performance and cost efficiency
   → Expand ML/AI use cases across supply chain
   → Train business users on self-service analytics
   → Plan for additional data sources and use cases

🛠️ CRITICAL SUCCESS FACTORS:
• Strong project management with clear milestones
• Dedicated team with both SQL Server and Databricks skills
• Comprehensive testing strategy including data validation
• User training and change management program
• Phased rollout with fallback procedures

📞 RECOMMENDED SUPPORT RESOURCES:
• Databricks Professional Services for complex components
• Lakebridge community and documentation
• Partner ecosystem for specialized migration expertise
• Training programs for team skill development""")

    # Risk mitigation strategies
    if high_risk_count > 0:
        print(f"""
⚠️  RISK MITIGATION FOR {high_risk_count} HIGH-RISK COMPONENTS:
• Assign senior architects to complex components
• Develop proof-of-concepts for high-risk migrations
• Plan for manual testing and validation
• Consider parallel runs during transition period
• Maintain rollback procedures for critical systems""")

    print("""
═══════════════════════════════════════════════════════════════════════════════
🎯 DECISION: PROCEED WITH DATABRICKS MIGRATION

The assessment demonstrates a strong business case for migrating GlobalSupply Corp's
data warehouse to Databricks. The combination of performance improvements, cost 
savings, and advanced analytics capabilities provides compelling ROI.

Next Workshop Module: Schema Migration & Transpilation
═══════════════════════════════════════════════════════════════════════════════
""")

else:
    print("⚠️ Assessment data not available for executive summary")

## 📤 Export Results for Stakeholders

Generate reports for different stakeholder groups:
- **Executive Summary** - High-level findings and recommendations
- **Technical Report** - Detailed migration plan with component breakdown
- **Project Plan** - Timeline, resources, and milestone tracking

In [None]:
# Export comprehensive results for different stakeholder groups
if 'Complexity_Analysis' in sheets_data:
    df = sheets_data['Complexity_Analysis']
    
    # Ensure migration wave assignments exist
    if 'Migration_Wave' not in df.columns:
        def assign_migration_wave(row):
            if row['Risk_Level'] == 'Low' and row['Complexity_Score'] < 6:
                return 'Wave 1 - Quick Wins'
            elif row['Risk_Level'] == 'Medium' or (row['Risk_Level'] == 'Low' and row['Complexity_Score'] >= 6):
                return 'Wave 2 - Standard Migration'
            else:
                return 'Wave 3 - Complex Components'
        df['Migration_Wave'] = df.apply(assign_migration_wave, axis=1)
    
    # 1. Export detailed technical migration plan
    try:
        df.to_csv('globalsupply_detailed_migration_plan.csv', index=False)
        print("✅ Detailed technical plan exported: globalsupply_detailed_migration_plan.csv")
    except Exception as e:
        print(f"⚠️ Could not export CSV: {e}")
    
    # 2. Create executive summary document
    total_files = len(df)
    total_hours = df['Migration_Hours'].sum()
    total_cost = total_hours * 150
    high_risk_count = len(df[df['Risk_Level'] == 'High'])
    
    exec_summary = f"""
GLOBALSUPPLY CORP - DATABRICKS MIGRATION ASSESSMENT
Executive Summary Report
Generated: {pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S')}
========================================================

PROJECT SCOPE:
• {total_files} SQL components analyzed
• {df['Lines_of_Code'].sum():,} total lines of code
• {df['Table_References'].sum()} database table dependencies
• Supply chain analytics and reporting workloads

INVESTMENT REQUIRED:
• Development Effort: {total_hours} hours
• Estimated Cost: ${total_cost:,} (including team, tools, training)
• Timeline: {total_hours/120:.1f} weeks with 3-person team
• Phased Approach: 3 migration waves over 4-5 months

RISK ASSESSMENT:
• {high_risk_count} high-risk components requiring expert attention
• {len(df[df['Risk_Level'] == 'Medium'])} medium-risk components for standard migration
• {len(df[df['Risk_Level'] == 'Low'])} low-risk components for quick wins
• Comprehensive dependency mapping completed
• Mitigation strategies defined for all risk categories

BUSINESS BENEFITS:
• Performance: 3-5x improvement in query execution
• Analytics: Advanced ML/AI capabilities for supply chain optimization
• User Experience: Natural language queries with Databricks Genie
• Cost Efficiency: 20-30% reduction in infrastructure costs
• Scalability: Unlimited scale for peak demand periods
• Innovation: 2x faster development of new analytics

FINANCIAL PROJECTIONS:
• Annual Cost Savings: $200,000+ (infrastructure + productivity)
• Payback Period: {(total_cost / (200000 / 12)):.1f} months
• 3-Year Net Present Value: ${(200000 * 3 - total_cost):,}
• ROI: {((200000 * 3 - total_cost) / total_cost * 100):.0f}% over 3 years

STRATEGIC RECOMMENDATION:
PROCEED with Databricks migration using Lakebridge toolchain.
The analysis demonstrates strong business justification with
manageable technical risk and clear path to success.

KEY SUCCESS FACTORS:
• Executive sponsorship and dedicated team
• Phased migration approach starting with quick wins
• Comprehensive testing and validation procedures
• User training and change management program
• Partnership with Databricks Professional Services

NEXT STEPS:
1. Secure budget approval and team assignment
2. Set up Databricks workspace and development environment  
3. Begin Schema Migration & Transpilation workshop (Module 2)
4. Execute Wave 1 migrations within 4-6 weeks

This assessment provides the foundation for a successful
modernization that will transform GlobalSupply Corp's 
analytics capabilities and competitive advantage.
"""
    
    try:
        with open('globalsupply_executive_summary.txt', 'w') as f:
            f.write(exec_summary)
        print("✅ Executive summary exported: globalsupply_executive_summary.txt")
    except Exception as e:
        print(f"⚠️ Could not export executive summary: {e}")
    
    # 3. Create project tracking template
    if 'Migration_Wave' in df.columns:
        wave_summary = df.groupby('Migration_Wave').agg({
            'File_Name': 'count',
            'Migration_Hours': 'sum',
            'Complexity_Score': 'mean'
        }).round(1)
        
        try:
            wave_summary.to_csv('globalsupply_project_waves.csv')
            print("✅ Project wave summary exported: globalsupply_project_waves.csv")
        except Exception as e:
            print(f"⚠️ Could not export wave summary: {e}")
    
    print("\n📊 ASSESSMENT COMPLETE - Ready for Next Phase")
    print("="*60)
    print("🎯 Deliverables Generated:")
    print("   📄 Executive Summary (for leadership team)")
    print("   📋 Detailed Migration Plan (for technical team)")
    print("   📊 Project Wave Summary (for project managers)")
    print("\n🚀 Next Workshop Module: Schema Migration & Transpilation")
    print("   Location: ../02_transpilation/")
    print("   Focus: Hands-on SQL conversion using Lakebridge")
    
else:
    print("⚠️ No assessment data available for export")
    print("\n📝 To generate real assessment data:")
    print("   1. Export your SQL Server workloads to files")
    print("   2. Run: databricks labs lakebridge analyze --source-tech mssql")
    print("   3. Re-run this notebook with the generated Excel report")