# GlobalSupply Corp - Migration Assessment Analysis

## 🏢 Business Context
**GlobalSupply Corp** operates a complex supply chain data warehouse built on SQL Server with TPC-H style schemas. This workshop demonstrates how to modernize this legacy system to Databricks using Lakebridge for:

- **AI-powered supply chain optimization**
- **Natural language queries for business users**  
- **Real-time analytics and forecasting**
- **Scalable cloud-native architecture**

## 📊 What This Notebook Covers
- **Migration Complexity Analysis** - Which components require the most effort
- **Dependency Mapping** - Critical interdependencies affecting migration sequencing
- **Risk Assessment** - Potential challenges and mitigation strategies
- **Business Value Analysis** - ROI projections for Databricks migration

---

## 🔧 Prerequisites & Setup

### Step 1: Install Required Python Dependencies
```bash
# Install essential packages for analysis and Excel support
pip install pandas matplotlib seaborn numpy openpyxl
```

**Note:** `openpyxl` is required to read Excel files generated by Lakebridge Analyzer.

### Step 2: Install Lakebridge (Optional - for real assessments)
```bash
# Install Databricks Labs Lakebridge
databricks labs install lakebridge

# Verify installation
databricks labs lakebridge analyze --help
```

### Step 3: Prepare Legacy SQL Files
For real assessments, export your SQL Server workloads to a directory structure like:
```
legacy_sql/
├── analytics/
│   ├── supply_chain_performance.sql
│   ├── inventory_optimization.sql
│   └── customer_profitability.sql
├── reports/
│   ├── financial_summary.sql
│   └── operational_reports.sql
└── etl/
    ├── data_processing.sql
    └── transformations.sql
```

### Step 4: Run Assessment (with sample data or real SQL files)
```bash
# Option 1: Use sample SQL files for demonstration
python 01_assessment_analyzer.py --generate-samples

# Option 2: Assess real SQL files
python 01_assessment_analyzer.py --source-directory /path/to/legacy_sql

# Option 3: Use Lakebridge directly (if installed and configured)
databricks labs lakebridge analyze \
  --source-directory /path/to/legacy_sql \
  --report-file globalsupply_assessment \
  --source-tech mssql
```

## 📋 Understanding Lakebridge Analyzer Output

The Lakebridge Analyzer generates comprehensive insights including:

### 🔍 Analysis Insights
- **Job Complexity Assessment** - Complexity scores (1-10) for each SQL component
- **Comprehensive Job Inventory** - All mappings, transformations, functions cataloged
- **Cross-System Interdependency Mapping** - Shows how components interact
- **Migration Effort Estimates** - Engineering hours required per component

### 📊 Key Outputs
- **Excel Report** with multiple worksheets:
  - Complexity Analysis
  - Dependency Mapping  
  - Function Usage Statistics
  - Migration Estimates
  - Risk Assessment

### 🎯 Business Value
- **Risk Assessment** for migration planning
- **Resource Planning** for modernization project  
- **Sequencing Guidance** to minimize disruption
- **TCO Analysis** for Databricks migration ROI

In [None]:
# Import required libraries for assessment analysis
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from pathlib import Path
import warnings
from typing import Dict, List, Optional, Union
import logging
warnings.filterwarnings('ignore')

# Configure plotting style for professional reports
plt.style.use('default')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 10

print("✅ Libraries imported successfully")
print("📊 Ready for GlobalSupply Corp assessment analysis")

## 📄 Intelligent Report Loading System

This enhanced loading system can handle:
- **Real Lakebridge Assessment Reports** - Automatically detects and normalizes various Excel formats
- **Sample Workshop Data** - Falls back to demonstration data when no real report exists
- **Column Mapping** - Intelligently maps different column names to standard format
- **Error Recovery** - Provides clear guidance when issues occur

### Expected Excel Worksheets:
- **Summary** - High-level overview and metrics
- **Complexity_Analysis** - Detailed complexity scores and effort estimates
- **Dependencies** - Component interdependencies and relationships
- **Function_Usage** - SQL function usage statistics and compatibility
- **Migration_Waves** - Recommended migration sequencing strategy

In [None]:
def find_assessment_report() -> Optional[Path]:
    """
    Intelligent assessment report discovery with multiple fallback strategies
    """
    current_dir = Path('.')
    
    # Look for assessment reports with comprehensive naming patterns
    patterns = [
        '*assessment*.xlsx', '*globalsupply*.xlsx', '*remorph*.xlsx',
        '*lakebridge*.xlsx', '*migration*.xlsx', '*analysis*.xlsx'
    ]
    
    excel_files = []
    for pattern in patterns:
        excel_files.extend(list(current_dir.glob(pattern)))
    
    if not excel_files:
        # Look for any Excel files as fallback
        excel_files = list(current_dir.glob('*.xlsx'))
    
    if excel_files:
        # Return the most recent file
        latest_file = max(excel_files, key=lambda x: x.stat().st_mtime)
        return latest_file
    
    return None

def normalize_column_names(df: pd.DataFrame) -> pd.DataFrame:
    """
    Normalize column names to standard format for consistent processing
    """
    # Standardize column names (remove spaces, lowercase, handle variations)
    df.columns = [col.strip().replace(' ', '_').replace('-', '_').lower() for col in df.columns]
    return df

def map_columns_to_standard(df: pd.DataFrame, sheet_type: str) -> pd.DataFrame:
    """
    Map various column name variations to standardized names
    """
    if sheet_type == 'Complexity_Analysis':
        column_mappings = {
            'filename': 'file_name', 'file': 'file_name', 'script_name': 'file_name',
            'name': 'file_name', 'object_name': 'file_name',
            'loc': 'lines_of_code', 'lines': 'lines_of_code', 'line_count': 'lines_of_code',
            'complexity': 'complexity_score', 'score': 'complexity_score',
            'effort': 'migration_hours', 'hours': 'migration_hours', 
            'migration_effort': 'migration_hours', 'work_hours': 'migration_hours',
            'risk': 'risk_level', 'risk_category': 'risk_level',
            'functions': 'functions_used', 'function_count': 'functions_used',
            'tables': 'table_references', 'table_count': 'table_references'
        }
    elif sheet_type == 'Dependencies':
        column_mappings = {
            'source': 'source_object', 'from_object': 'source_object',
            'target': 'target_object', 'to_object': 'target_object',
            'type': 'dependency_type', 'dep_type': 'dependency_type',
            'priority': 'criticality', 'importance': 'criticality'
        }
    elif sheet_type == 'Function_Usage':
        column_mappings = {
            'function': 'function_name', 'func_name': 'function_name',
            'count': 'usage_count', 'usage': 'usage_count',
            'complexity_score': 'complexity_impact', 'impact': 'complexity_impact',
            'compatibility': 'databricks_compatibility', 'compat': 'databricks_compatibility'
        }
    else:
        column_mappings = {}
    
    # Apply mappings
    for old_col, new_col in column_mappings.items():
        if old_col in df.columns and new_col not in df.columns:
            df[new_col] = df[old_col]
    
    return df

def ensure_required_columns(df: pd.DataFrame, sheet_type: str) -> pd.DataFrame:
    """
    Ensure all required columns exist with sensible defaults
    """
    if sheet_type == 'Complexity_Analysis':
        required_columns = {
            'file_name': 'Unknown',
            'complexity_score': 5.0,
            'migration_hours': 8,
            'risk_level': 'Medium',
            'lines_of_code': 100,
            'category': 'Other',
            'sql_features': 'Standard SQL'
        }
    elif sheet_type == 'Dependencies':
        required_columns = {
            'source_object': 'Unknown',
            'target_object': 'Unknown', 
            'dependency_type': 'Unknown',
            'criticality': 'Medium'
        }
    elif sheet_type == 'Function_Usage':
        required_columns = {
            'function_name': 'Unknown',
            'usage_count': 1,
            'complexity_impact': 3,
            'databricks_compatibility': 'Unknown'
        }
    else:
        required_columns = {}
    
    # Add missing columns with defaults
    for col, default_val in required_columns.items():
        if col not in df.columns:
            df[col] = default_val
            print(f"➕ Added missing column '{col}' with default value")
    
    return df

def create_sample_data() -> Dict[str, pd.DataFrame]:
    """
    Create comprehensive sample data for workshop demonstration
    """
    print("🎭 Creating comprehensive sample GlobalSupply Corp assessment data...")
    
    # Sample complexity analysis representing real-world SQL Server workloads
    complexity_data = pd.DataFrame({
        'file_name': [
            'supply_chain_performance.sql', 'inventory_optimization.sql', 'customer_profitability.sql',
            'supplier_risk_assessment.sql', 'dynamic_reporting.sql', 'window_functions_analysis.sql',
            'financial_summary.sql', 'order_processing.sql'
        ],
        'lines_of_code': [145, 198, 167, 223, 89, 134, 67, 89],
        'complexity_score': [8.5, 9.2, 7.8, 9.8, 6.5, 7.2, 4.5, 5.1],
        'functions_used': [12, 18, 14, 22, 8, 16, 6, 9],
        'table_references': [8, 6, 7, 9, 5, 4, 3, 4],
        'migration_hours': [16, 24, 18, 32, 8, 12, 4, 6],
        'risk_level': ['Medium', 'High', 'Medium', 'High', 'Low', 'Medium', 'Low', 'Low'],
        'category': ['Analytics', 'Analytics', 'Analytics', 'Analytics', 'Reporting', 'Analytics', 'Reporting', 'OLTP'],
        'sql_features': [
            'CTEs, Window Functions, Complex Joins',
            'Recursive CTEs, PIVOT, Advanced Analytics', 
            'PIVOT, Window Functions, String Aggregation',
            'Recursive CTEs, Dynamic SQL, Risk Scoring',
            'Dynamic SQL, Conditional Logic',
            'Advanced Window Functions, LAG/LEAD',
            'Basic Aggregation, Simple Joins',
            'CRUD Operations, Transactions'
        ]
    })
    
    # Sample dependency mapping with realistic relationships
    dependency_data = pd.DataFrame({
        'source_object': [
            'customer_profitability.sql', 'supplier_risk_assessment.sql', 'inventory_optimization.sql',
            'supply_chain_performance.sql', 'window_functions_analysis.sql', 'order_processing.sql'
        ],
        'target_object': [
            'financial_summary.sql', 'supply_chain_performance.sql', 'order_processing.sql',
            'dynamic_reporting.sql', 'customer_profitability.sql', 'financial_summary.sql'
        ],
        'dependency_type': ['View', 'Procedure', 'Table', 'View', 'Function', 'Table'],
        'criticality': ['High', 'High', 'Medium', 'Medium', 'Low', 'Medium']
    })
    
    # Sample function usage with Databricks compatibility assessment
    function_data = pd.DataFrame({
        'function_name': [
            'ROW_NUMBER', 'RANK', 'LAG', 'LEAD', 'SUM', 'AVG', 'COUNT', 'DATEDIFF', 
            'DATEADD', 'STRING_AGG', 'PIVOT', 'CASE', 'CTE', 'RECURSIVE_CTE', 'STDEV', 'VAR'
        ],
        'usage_count': [15, 12, 8, 6, 25, 20, 30, 18, 14, 5, 3, 22, 18, 2, 4, 3],
        'complexity_impact': [3, 3, 4, 4, 1, 1, 1, 2, 2, 4, 5, 2, 3, 5, 3, 3],
        'databricks_compatibility': [
            'Direct', 'Direct', 'Direct', 'Direct', 'Direct', 'Direct', 'Direct', 'Modified',
            'Modified', 'Modified', 'Modified', 'Direct', 'Direct', 'Complex', 'Direct', 'Direct'
        ]
    })
    
    return {
        'Complexity_Analysis': complexity_data,
        'Dependencies': dependency_data,
        'Function_Usage': function_data
    }

In [None]:
# Enhanced assessment report loading with intelligent fallback
def load_assessment_data() -> Dict[str, pd.DataFrame]:
    """
    Master function to load assessment data from various sources
    """
    sheets_data = {}
    
    # Step 1: Try to find existing assessment report
    report_file = find_assessment_report()
    
    if report_file:
        print(f"📄 Found assessment report: {report_file}")
        
        try:
            # Load the Excel file and examine structure
            excel_data = pd.ExcelFile(report_file)
            print(f"📋 Available worksheets: {excel_data.sheet_names}")
            
            # Map common sheet name variations to standard names
            sheet_mappings = {
                'Summary': ['Summary', 'Overview', 'Report_Summary'],
                'Complexity_Analysis': ['Complexity', 'Analysis', 'Complexity_Analysis', 'Job_Analysis', 'Complexity Analysis'],
                'Dependencies': ['Dependencies', 'Dependency', 'Relationships', 'Job_Dependencies'],
                'Function_Usage': ['Functions', 'Function_Usage', 'SQL_Functions', 'Features', 'Function Usage'],
                'Migration_Waves': ['Waves', 'Migration_Waves', 'Migration_Strategy', 'Migration Waves']
            }
            
            # Load and normalize each sheet
            for standard_name, possible_names in sheet_mappings.items():
                sheet_found = False
                for possible_name in possible_names:
                    if possible_name in excel_data.sheet_names:
                        try:
                            df = pd.read_excel(report_file, sheet_name=possible_name)
                            if len(df) > 0:  # Only process non-empty sheets
                                # Apply all normalization steps
                                df = normalize_column_names(df)
                                df = map_columns_to_standard(df, standard_name)
                                df = ensure_required_columns(df, standard_name)
                                
                                sheets_data[standard_name] = df
                                print(f"✅ Loaded '{possible_name}' as '{standard_name}': {len(df)} rows, {len(df.columns)} columns")
                                sheet_found = True
                                break
                        except Exception as e:
                            print(f"⚠️ Could not load sheet '{possible_name}': {e}")
                
                if not sheet_found:
                    print(f"⚠️ No matching sheet found for '{standard_name}'")
            
            # If we got at least some data, we're good
            if sheets_data:
                print(f"✅ Successfully loaded real assessment report with {len(sheets_data)} worksheets")
                return sheets_data
            
        except Exception as e:
            print(f"❌ Error loading Excel file: {e}")
    
    # Step 2: Fallback to sample data
    print("📊 No usable assessment report found, using sample workshop data")
    print("")
    print("💡 To use a real assessment report:")
    print("   1. Ensure Lakebridge is installed: databricks labs install lakebridge")
    print("   2. Run assessment: python 01_assessment_analyzer.py --generate-samples")
    print("   3. Or place your existing .xlsx report in this directory")
    print("")
    
    return create_sample_data()

# Load the assessment data using our intelligent system
sheets_data = load_assessment_data()

# Display summary of loaded data
print("\n" + "="*70)
print("📊 ASSESSMENT DATA LOADING SUMMARY")
print("="*70)

for sheet_name, df in sheets_data.items():
    print(f"📋 {sheet_name}: {len(df)} rows, {len(df.columns)} columns")
    if len(df.columns) > 0:
        print(f"   Columns: {', '.join(df.columns[:5])}{'...' if len(df.columns) > 5 else ''}")

print("="*70)
print("🎯 Data loading complete! Ready for analysis and visualization.")

## 🎯 Assessment Overview Dashboard

This section provides a high-level overview of the migration assessment, showing key metrics that executives and project managers need to understand the scope and complexity of the modernization effort.

In [None]:
# Generate comprehensive assessment overview with robust error handling
if 'Complexity_Analysis' in sheets_data and len(sheets_data['Complexity_Analysis']) > 0:
    df = sheets_data['Complexity_Analysis']
    
    print("\n" + "="*80)
    print("📊 GLOBALSUPPLY CORP - MIGRATION ASSESSMENT OVERVIEW")
    print("="*80)
    
    # Key metrics with safe calculations
    total_files = len(df)
    total_loc = df['lines_of_code'].sum() if 'lines_of_code' in df.columns else 0
    total_hours = df['migration_hours'].sum() if 'migration_hours' in df.columns else 0
    avg_complexity = df['complexity_score'].mean() if 'complexity_score' in df.columns else 5.0
    
    print(f"📁 Total SQL Components: {total_files}")
    print(f"📏 Total Lines of Code: {total_loc:,}")
    print(f"⏱️  Estimated Migration Hours: {total_hours}")
    print(f"📈 Average Complexity Score: {avg_complexity:.1f}/10")
    print(f"💰 Estimated Cost (@$150/hour): ${total_hours * 150:,}")
    
    # Risk distribution analysis with error handling
    if 'risk_level' in df.columns:
        risk_distribution = df['risk_level'].value_counts()
        print(f"\n🚦 RISK DISTRIBUTION:")
        for risk, count in risk_distribution.items():
            percentage = (count/len(df)*100)
            risk_emoji = {'Low': '🟢', 'Medium': '🟡', 'High': '🔴'}
            print(f"   {risk_emoji.get(risk, '⚪')} {risk}: {count} files ({percentage:.1f}%)")
    
    # Category distribution with error handling
    if 'category' in df.columns:
        category_distribution = df['category'].value_counts()
        print(f"\n📊 WORKLOAD CATEGORIES:")
        for category, count in category_distribution.items():
            percentage = (count/len(df)*100)
            print(f"   📈 {category}: {count} files ({percentage:.1f}%)")
    
    # Timeline estimates
    weeks_single = total_hours / 40 if total_hours > 0 else 0
    weeks_team = total_hours / 160 if total_hours > 0 else 0  # 4-person team
    
    print(f"\n📅 TIMELINE ESTIMATES:")
    print(f"   👤 Single Developer: {weeks_single:.1f} weeks")
    print(f"   👥 4-Person Team: {weeks_team:.1f} weeks")
    
    print("\n💡 KEY INSIGHTS:")
    high_risk_count = len(df[df['risk_level'] == 'High']) if 'risk_level' in df.columns else 0
    complex_count = len(df[df['complexity_score'] > 8]) if 'complexity_score' in df.columns else 0
    
    if high_risk_count > 0:
        print(f"   ⚠️  {high_risk_count} high-risk components need expert attention")
    if complex_count > 0:
        print(f"   🧠 {complex_count} highly complex components (>8/10 complexity)")
    print(f"   🎯 Recommended phased approach in 3 migration waves")
    print(f"   📈 Expected 3-5x performance improvement with Databricks")
    
    print("="*80)
else:
    print("⚠️ Complexity analysis data not available")
    print("💡 This could happen if:")
    print("   • The Excel report doesn't have a recognized complexity worksheet")
    print("   • The worksheet is empty or has incorrect column names")
    print("   • Try regenerating the assessment report")

## 📈 Complexity Analysis Visualizations

These visualizations help identify which components will require the most attention during migration and provide insights into the overall complexity distribution of the SQL codebase.

In [None]:
# Robust complexity analysis with comprehensive error handling
if 'Complexity_Analysis' in sheets_data and len(sheets_data['Complexity_Analysis']) > 0:
    df = sheets_data['Complexity_Analysis']
    
    # Validate required columns exist
    required_cols = ['complexity_score', 'migration_hours']
    missing_cols = [col for col in required_cols if col not in df.columns]
    
    if missing_cols:
        print(f"⚠️ Missing required columns: {missing_cols}")
        print("Adding default values for visualization...")
        for col in missing_cols:
            if col == 'complexity_score':
                df[col] = np.random.uniform(4, 9, len(df))
            elif col == 'migration_hours':
                df[col] = np.random.randint(4, 32, len(df))
    
    try:
        # Create comprehensive complexity dashboard
        fig, axes = plt.subplots(2, 2, figsize=(16, 12))
        fig.suptitle('GlobalSupply Corp - Migration Complexity Analysis Dashboard', 
                     fontsize=16, fontweight='bold', y=0.98)
        
        # 1. Complexity Score Distribution
        axes[0, 0].hist(df['complexity_score'], bins=10, alpha=0.7, color='skyblue', 
                        edgecolor='black', linewidth=1.2)
        axes[0, 0].set_title('📊 Complexity Score Distribution', fontweight='bold')
        axes[0, 0].set_xlabel('Complexity Score (1-10 scale)')
        axes[0, 0].set_ylabel('Number of SQL Components')
        
        # Add mean line and statistics
        mean_complexity = df['complexity_score'].mean()
        axes[0, 0].axvline(mean_complexity, color='red', linestyle='--', linewidth=2,
                           label=f'Mean: {mean_complexity:.1f}')
        axes[0, 0].legend()
        
        # 2. Risk Level Distribution (if available)
        if 'risk_level' in df.columns:
            risk_counts = df['risk_level'].value_counts()
            colors = {'Low': '#2ecc71', 'Medium': '#f39c12', 'High': '#e74c3c'}
            risk_colors = [colors.get(risk, '#95a5a6') for risk in risk_counts.index]
            
            wedges, texts, autotexts = axes[0, 1].pie(risk_counts.values, labels=risk_counts.index, 
                                                      autopct='%1.1f%%', colors=risk_colors, 
                                                      startangle=90, explode=(0.05, 0.05, 0.05))
            axes[0, 1].set_title('🚦 Migration Risk Distribution', fontweight='bold')
            
            # Enhance pie chart text
            for autotext in autotexts:
                autotext.set_color('white')
                autotext.set_fontweight('bold')
        else:
            axes[0, 1].text(0.5, 0.5, 'Risk Level Data\nNot Available', 
                           transform=axes[0, 1].transAxes, ha='center', va='center',
                           fontsize=14, bbox=dict(boxstyle='round', facecolor='lightgray'))
            axes[0, 1].set_title('🚦 Migration Risk Distribution', fontweight='bold')
        
        # 3. Effort vs Complexity Scatter Plot
        bubble_size = df.get('lines_of_code', pd.Series([100] * len(df)))
        scatter = axes[1, 0].scatter(df['complexity_score'], df['migration_hours'], 
                                    c=bubble_size, cmap='viridis', alpha=0.8, 
                                    s=150, edgecolors='black', linewidth=0.5)
        
        axes[1, 0].set_xlabel('Complexity Score')
        axes[1, 0].set_ylabel('Migration Hours')
        axes[1, 0].set_title('⚡ Effort vs Complexity (bubble size = Lines of Code)', fontweight='bold')
        
        # Add colorbar if we have size data
        if 'lines_of_code' in df.columns:
            cbar = plt.colorbar(scatter, ax=axes[1, 0])
            cbar.set_label('Lines of Code', rotation=270, labelpad=20)
        
        # Add trend line
        z = np.polyfit(df['complexity_score'], df['migration_hours'], 1)
        p = np.poly1d(z)
        axes[1, 0].plot(df['complexity_score'], p(df['complexity_score']), "r--", alpha=0.7)
        
        # 4. Category-wise Migration Hours (if available)
        if 'category' in df.columns:
            category_hours = df.groupby('category')['migration_hours'].sum().sort_values(ascending=True)
            bars = axes[1, 1].barh(category_hours.index, category_hours.values, 
                                   color=['#3498db', '#e67e22', '#9b59b6'], alpha=0.8)
            
            axes[1, 1].set_xlabel('Total Migration Hours')
            axes[1, 1].set_title('📊 Migration Effort by Category', fontweight='bold')
            
            # Add value labels on bars
            for i, (bar, value) in enumerate(zip(bars, category_hours.values)):
                axes[1, 1].text(value + 0.5, i, f'{int(value)}h', 
                                 va='center', fontweight='bold')
        else:
            axes[1, 1].text(0.5, 0.5, 'Category Data\nNot Available', 
                           transform=axes[1, 1].transAxes, ha='center', va='center',
                           fontsize=14, bbox=dict(boxstyle='round', facecolor='lightgray'))
            axes[1, 1].set_title('📊 Migration Effort by Category', fontweight='bold')
        
        plt.tight_layout()
        plt.subplots_adjust(top=0.93)
        plt.show()
        
        # Additional insights table with safe column access
        print("\n📋 DETAILED COMPLEXITY BREAKDOWN:")
        print("-" * 80)
        
        # Show top 5 most complex components
        display_columns = ['file_name', 'complexity_score', 'migration_hours']
        if 'sql_features' in df.columns:
            display_columns.append('sql_features')
        
        available_columns = [col for col in display_columns if col in df.columns]
        
        if available_columns:
            top_complex = df.nlargest(5, 'complexity_score')[available_columns]
            
            for _, row in top_complex.iterrows():
                print(f"📄 {row.get('file_name', 'Unknown')}:")
                print(f"   🎯 Complexity: {row.get('complexity_score', 'N/A')}/10")
                print(f"   ⏱️  Effort: {row.get('migration_hours', 'N/A')} hours")
                if 'sql_features' in row:
                    print(f"   🔧 Features: {row.get('sql_features', 'N/A')}")
                print()
        
    except Exception as e:
        print(f"❌ Error creating visualizations: {e}")
        print("This might be due to data format issues. The analysis can still continue.")
    
else:
    print("⚠️ Complexity analysis data not available for visualization")
    print("💡 Make sure your assessment report includes a 'Complexity_Analysis' worksheet")

## 🔗 Dependency Analysis

Understanding dependencies is crucial for migration sequencing. This analysis shows which components depend on others and helps plan the migration order to minimize risk and ensure system stability throughout the process.

### Why Dependencies Matter:
- **Migration Sequencing** - Dependencies must be migrated before dependent components
- **Risk Management** - High-criticality dependencies require extra attention
- **Testing Strategy** - Dependent components need integrated testing
- **Rollback Planning** - Understanding dependencies helps plan safe rollback procedures

In [None]:
# Robust dependency analysis with comprehensive error handling
if 'Dependencies' in sheets_data and len(sheets_data['Dependencies']) > 0:
    dep_df = sheets_data['Dependencies']
    
    print("🔗 DEPENDENCY ANALYSIS FOR MIGRATION PLANNING")
    print("="*70)
    
    try:
        print(f"📊 Total Dependencies Identified: {len(dep_df)}")
        
        # Analyze dependency criticality (if available)
        if 'criticality' in dep_df.columns:
            criticality_counts = dep_df['criticality'].value_counts()
            print("\n🚦 Dependency Criticality Breakdown:")
            for crit, count in criticality_counts.items():
                crit_emoji = {'High': '🔴', 'Medium': '🟡', 'Low': '🟢'}
                percentage = (count / len(dep_df) * 100)
                print(f"   {crit_emoji.get(crit, '⚪')} {crit}: {count} dependencies ({percentage:.1f}%)")
        
        # Analyze dependency types (if available)
        if 'dependency_type' in dep_df.columns:
            type_counts = dep_df['dependency_type'].value_counts()
            print("\n📋 Dependency Types:")
            for dep_type, count in type_counts.items():
                print(f"   📊 {dep_type}: {count} dependencies")
        
        print("\n🔗 Critical Dependency Chains:")
        print("-" * 50)
        
        # Show high-criticality dependencies first (if available)
        if 'criticality' in dep_df.columns:
            high_crit_deps = dep_df[dep_df['criticality'] == 'High']
        else:
            high_crit_deps = dep_df.head(5)  # Show first 5 as fallback
        
        for _, row in high_crit_deps.iterrows():
            source = str(row.get('source_object', 'Unknown')).replace('.sql', '')
            target = str(row.get('target_object', 'Unknown')).replace('.sql', '')
            dep_type = row.get('dependency_type', 'Unknown')
            criticality = row.get('criticality', 'Unknown')
            
            print(f"   {source} ➜ {target}")
            print(f"     Type: {dep_type} | Criticality: {criticality}")
            print()
        
        # Migration sequencing recommendations
        print("📋 MIGRATION SEQUENCING RECOMMENDATIONS:")
        print("-" * 50)
        
        # Find components with no dependencies (good starting points)
        if 'source_object' in dep_df.columns and 'target_object' in dep_df.columns:
            all_sources = set(dep_df['source_object'])
            all_targets = set(dep_df['target_object'])
            independent_components = all_sources - all_targets
            
            if independent_components:
                print("🏁 Start with these independent components:")
                for comp in list(independent_components)[:3]:
                    print(f"   ✅ {str(comp).replace('.sql', '')}")
            
            # Find components that many others depend on
            dependency_counts = dep_df['source_object'].value_counts()
            if len(dependency_counts) > 0:
                print("\n🎯 High-priority components (many dependencies):")
                for comp, count in dependency_counts.head(3).items():
                    print(f"   🔴 {str(comp).replace('.sql', '')} ({count} dependent components)")
        
        # Create dependency visualizations
        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))
        fig.suptitle('GlobalSupply Corp - Dependency Analysis', fontsize=14, fontweight='bold')
        
        # 1. Criticality distribution (if available)
        if 'criticality' in dep_df.columns:
            criticality_counts = dep_df['criticality'].value_counts()
            colors = {'High': '#e74c3c', 'Medium': '#f39c12', 'Low': '#2ecc71'}
            crit_colors = [colors.get(crit, '#95a5a6') for crit in criticality_counts.index]
            
            wedges, texts, autotexts = ax1.pie(criticality_counts.values, 
                                               labels=criticality_counts.index,
                                               autopct='%1.1f%%', colors=crit_colors,
                                               startangle=90, explode=(0.05, 0.05, 0.05))
            ax1.set_title('🚦 Dependency Criticality', fontweight='bold')
            
            for autotext in autotexts:
                autotext.set_color('white')
                autotext.set_fontweight('bold')
        else:
            ax1.text(0.5, 0.5, 'Criticality Data\nNot Available', 
                    ha='center', va='center', fontsize=14,
                    bbox=dict(boxstyle='round', facecolor='lightgray'))
            ax1.set_title('🚦 Dependency Criticality', fontweight='bold')
        
        # 2. Dependency type distribution (if available)
        if 'dependency_type' in dep_df.columns:
            type_counts = dep_df['dependency_type'].value_counts()
            bars = ax2.bar(type_counts.index, type_counts.values, 
                          color=['#3498db', '#e67e22', '#9b59b6', '#1abc9c'][:len(type_counts)])
            
            ax2.set_title('📊 Dependencies by Type', fontweight='bold')
            ax2.set_xlabel('Dependency Type')
            ax2.set_ylabel('Count')
            ax2.tick_params(axis='x', rotation=45)
            
            # Add value labels on bars
            for bar, value in zip(bars, type_counts.values):
                ax2.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.1,
                         str(value), ha='center', va='bottom', fontweight='bold')
        else:
            ax2.text(0.5, 0.5, 'Dependency Type\nData Not Available', 
                    ha='center', va='center', fontsize=14,
                    bbox=dict(boxstyle='round', facecolor='lightgray'))
            ax2.set_title('📊 Dependencies by Type', fontweight='bold')
        
        plt.tight_layout()
        plt.show()
        
    except Exception as e:
        print(f"❌ Error in dependency analysis: {e}")
        print("This might be due to data format issues, but basic analysis completed.")
    
else:
    print("⚠️ Dependency analysis data not available")
    print("💡 In a real assessment, Lakebridge would identify:")
    print("   • Table-to-table dependencies")
    print("   • View-to-table relationships")
    print("   • Stored procedure call chains")
    print("   • Cross-database dependencies")

## 🔧 SQL Function Analysis

This analysis examines SQL functions and features used in the legacy code to understand:
- **Databricks Compatibility** - Which functions translate directly vs need modification
- **Complexity Drivers** - Functions that contribute most to migration complexity
- **Focus Areas** - Where development effort should be concentrated

In [None]:
# Comprehensive function analysis with robust error handling
if 'Function_Usage' in sheets_data and len(sheets_data['Function_Usage']) > 0:
    func_df = sheets_data['Function_Usage']
    
    print("🔧 SQL FUNCTION COMPATIBILITY ANALYSIS")
    print("="*70)
    
    try:
        # Overall statistics
        total_functions = func_df['usage_count'].sum() if 'usage_count' in func_df.columns else len(func_df)
        unique_functions = len(func_df)
        
        print(f"📊 Total Function Usages: {total_functions}")
        print(f"🔧 Unique Functions: {unique_functions}")
        
        # Compatibility breakdown (if available)
        if 'databricks_compatibility' in func_df.columns:
            compat_counts = func_df['databricks_compatibility'].value_counts()
            print("\n🎯 DATABRICKS COMPATIBILITY BREAKDOWN:")
            
            compat_emojis = {
                'Direct': '✅ Direct translation',
                'Modified': '🔄 Requires modification', 
                'Complex': '⚠️ Complex migration',
                'Manual': '🛠️ Manual rewrite needed'
            }
            
            for compat, count in compat_counts.items():
                percentage = (count / len(func_df) * 100)
                emoji_desc = compat_emojis.get(compat, f'🔍 {compat}')
                print(f"   {emoji_desc}: {count} functions ({percentage:.1f}%)")
        
        # Top complexity drivers (if available)
        if 'complexity_impact' in func_df.columns:
            print("\n🎯 TOP COMPLEXITY DRIVERS:")
            complexity_drivers = func_df.nlargest(5, 'complexity_impact')
            
            for _, row in complexity_drivers.iterrows():
                function = row.get('function_name', 'Unknown')
                usage = row.get('usage_count', 'N/A')
                complexity = row.get('complexity_impact', 'N/A')
                compat = row.get('databricks_compatibility', 'Unknown')
                
                print(f"   🔧 {function}:")
                print(f"      Used {usage} times | Complexity: {complexity}/5 | Compatibility: {compat}")
        
        # Most used functions (if available)
        if 'usage_count' in func_df.columns:
            print("\n📈 MOST FREQUENTLY USED FUNCTIONS:")
            top_used = func_df.nlargest(5, 'usage_count')
            
            for _, row in top_used.iterrows():
                function = row.get('function_name', 'Unknown')
                usage = row.get('usage_count', 'N/A')
                compat = row.get('databricks_compatibility', 'Unknown')
                
                status_emoji = {'Direct': '✅', 'Modified': '🔄', 'Complex': '⚠️'}.get(compat, '🔍')
                print(f"   {status_emoji} {function}: {usage} usages ({compat})")
        
        # Create function analysis visualization
        fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(16, 10))
        fig.suptitle('SQL Function Usage & Compatibility Analysis', fontsize=14, fontweight='bold')
        
        # 1. Compatibility distribution (if available)
        if 'databricks_compatibility' in func_df.columns:
            compat_counts = func_df['databricks_compatibility'].value_counts()
            colors = {'Direct': '#2ecc71', 'Modified': '#f39c12', 'Complex': '#e74c3c', 'Manual': '#8e44ad'}
            compat_colors = [colors.get(comp, '#95a5a6') for comp in compat_counts.index]
            
            ax1.pie(compat_counts.values, labels=compat_counts.index, autopct='%1.1f%%',
                    colors=compat_colors, startangle=90)
            ax1.set_title('🎯 Compatibility Distribution')
        else:
            ax1.text(0.5, 0.5, 'Compatibility Data\nNot Available', 
                    ha='center', va='center', fontsize=14,
                    bbox=dict(boxstyle='round', facecolor='lightgray'))
            ax1.set_title('🎯 Compatibility Distribution')
        
        # 2. Top used functions (if available)
        if 'usage_count' in func_df.columns and 'function_name' in func_df.columns:
            top_used = func_df.nlargest(min(8, len(func_df)), 'usage_count')
            bars = ax2.barh(top_used['function_name'], top_used['usage_count'], color='skyblue')
            ax2.set_title('📊 Most Used Functions')
            ax2.set_xlabel('Usage Count')
            
            # Add value labels
            for bar, value in zip(bars, top_used['usage_count']):
                ax2.text(bar.get_width() + 0.3, bar.get_y() + bar.get_height()/2,
                         str(value), va='center', fontweight='bold')
        else:
            ax2.text(0.5, 0.5, 'Usage Count Data\nNot Available', 
                    ha='center', va='center', fontsize=14,
                    bbox=dict(boxstyle='round', facecolor='lightgray'))
            ax2.set_title('📊 Most Used Functions')
        
        # 3. Complexity vs Usage scatter (if available)
        if 'complexity_impact' in func_df.columns and 'usage_count' in func_df.columns:
            scatter = ax3.scatter(func_df['usage_count'], func_df['complexity_impact'],
                                 alpha=0.7, s=100, color='coral')
            ax3.set_xlabel('Usage Count')
            ax3.set_ylabel('Complexity Impact (1-5)')
            ax3.set_title('⚡ Usage vs Complexity')
            ax3.grid(True, alpha=0.3)
        else:
            ax3.text(0.5, 0.5, 'Complexity/Usage\nData Not Available', 
                    ha='center', va='center', fontsize=14,
                    bbox=dict(boxstyle='round', facecolor='lightgray'))
            ax3.set_title('⚡ Usage vs Complexity')
        
        # 4. Key insights
        ax4.text(0.1, 0.8, "🎯 KEY MIGRATION INSIGHTS:", transform=ax4.transAxes, 
                 fontsize=12, fontweight='bold')
        
        insights = [
            "• Focus on high-usage complex functions first",
            "• Test modified functions thoroughly", 
            "• Consider performance implications",
            "• Plan for user training on new syntax"
        ]
        
        for i, insight in enumerate(insights):
            ax4.text(0.1, 0.6 - i*0.1, insight, transform=ax4.transAxes, fontsize=10)
        
        ax4.axis('off')
        
        plt.tight_layout()
        plt.show()
        
    except Exception as e:
        print(f"❌ Error in function analysis: {e}")
        print("This might be due to data format issues, but basic analysis completed.")
    
else:
    print("⚠️ Function usage data not available")
    print("💡 In a real assessment, this would show:")
    print("   • SQL Server specific functions used")
    print("   • Databricks compatibility status")
    print("   • Migration complexity by function type")

## 📋 Migration Planning Dashboard

This section provides actionable insights for planning the migration, including:
- **Migration Wave Strategy** - Phased approach based on complexity and risk
- **Resource Planning** - Team size and timeline estimates
- **Risk Mitigation** - Specific recommendations for high-risk components
- **Cost-Benefit Analysis** - Investment vs expected returns

In [None]:
# Comprehensive migration planning with robust error handling
if 'Complexity_Analysis' in sheets_data and len(sheets_data['Complexity_Analysis']) > 0:
    df = sheets_data['Complexity_Analysis'].copy()
    
    print("📋 GLOBALSUPPLY CORP - MIGRATION PLANNING STRATEGY")
    print("="*80)
    
    try:
        # Define migration waves based on complexity and risk
        def assign_migration_wave(row):
            """
            Assign components to migration waves based on complexity and risk
            Wave 1: Low risk, low-medium complexity (Quick Wins)
            Wave 2: Medium risk or higher complexity (Standard Migration)
            Wave 3: High risk or very high complexity (Complex Components)
            """
            risk = row.get('risk_level', 'Medium')
            complexity = row.get('complexity_score', 5.0)
            
            if risk == 'Low' and complexity < 6:
                return 'Wave 1 - Quick Wins'
            elif risk == 'Medium' or (risk == 'Low' and complexity >= 6):
                return 'Wave 2 - Standard Migration'
            else:
                return 'Wave 3 - Complex Components'
        
        df['Migration_Wave'] = df.apply(assign_migration_wave, axis=1)
        
        # Analyze migration waves with safe aggregation
        wave_analysis = df.groupby('Migration_Wave').agg({
            'file_name': 'count',
            'migration_hours': lambda x: x.sum() if 'migration_hours' in df.columns else 0, 
            'complexity_score': lambda x: x.mean() if 'complexity_score' in df.columns else 5.0,
            'lines_of_code': lambda x: x.sum() if 'lines_of_code' in df.columns else 0
        }).round(1)
        
        wave_analysis.columns = ['File_Count', 'Total_Hours', 'Avg_Complexity', 'Total_LOC']
        wave_analysis = wave_analysis.reindex([
            'Wave 1 - Quick Wins', 
            'Wave 2 - Standard Migration', 
            'Wave 3 - Complex Components'
        ]).fillna(0)
        
        print("🌊 RECOMMENDED MIGRATION WAVE STRATEGY:")
        print("-" * 80)
        
        wave_descriptions = {
            'Wave 1 - Quick Wins': {
                'emoji': '🟢',
                'description': 'Low risk, straightforward migrations to build momentum',
                'timeline': '2-4 weeks',
                'team': '1-2 developers'
            },
            'Wave 2 - Standard Migration': {
                'emoji': '🟡', 
                'description': 'Standard complexity migrations with moderate risk',
                'timeline': '4-8 weeks',
                'team': '2-3 developers'
            },
            'Wave 3 - Complex Components': {
                'emoji': '🔴',
                'description': 'High complexity/risk components requiring expert attention',
                'timeline': '6-12 weeks',
                'team': '3-4 senior developers'
            }
        }
        
        for wave, row in wave_analysis.iterrows():
            if pd.isna(row['File_Count']) or row['File_Count'] == 0:
                continue
                
            wave_info = wave_descriptions.get(wave, {'emoji': '⚪', 'description': '', 'timeline': '', 'team': ''})
            
            print(f"{wave_info['emoji']} {wave}:")
            print(f"   📄 Components: {int(row['File_Count'])}")
            print(f"   ⏱️  Total Hours: {int(row['Total_Hours'])}")
            print(f"   📊 Avg Complexity: {row['Avg_Complexity']}/10")
            print(f"   📏 Lines of Code: {int(row['Total_LOC']):,}")
            print(f"   📅 Timeline: {wave_info['timeline']}")
            print(f"   👥 Team Size: {wave_info['team']}")
            print(f"   💡 Strategy: {wave_info['description']}")
            print()
        
        # High-risk component analysis (if available)
        if 'risk_level' in df.columns:
            high_risk_files = df[df['risk_level'] == 'High']
            
            if len(high_risk_files) > 0:
                print("⚠️  HIGH-RISK COMPONENTS - SPECIAL ATTENTION REQUIRED:")
                print("-" * 80)
                
                for _, file_data in high_risk_files.iterrows():
                    print(f"🔴 {file_data.get('file_name', 'Unknown')}:")
                    print(f"   🎯 Complexity: {file_data.get('complexity_score', 'N/A')}/10")
                    print(f"   ⏱️  Effort: {file_data.get('migration_hours', 'N/A')} hours")
                    print(f"   🔧 Features: {file_data.get('sql_features', 'Advanced SQL patterns')}")
                    
                    # Specific recommendations based on complexity
                    complexity = file_data.get('complexity_score', 5.0)
                    if complexity > 9:
                        print(f"   💡 Recommendation: Assign senior architect, plan proof-of-concept")
                    else:
                        print(f"   💡 Recommendation: Assign senior developer, thorough testing required")
                    print()
        
        # Cost-benefit analysis with safe calculations
        total_hours = df['migration_hours'].sum() if 'migration_hours' in df.columns else 0
        avg_hourly_rate = 150
        total_cost = total_hours * avg_hourly_rate
        
        print("💰 COST-BENEFIT ANALYSIS:")
        print("-" * 80)
        print(f"📊 Total Migration Effort: {total_hours} hours")
        print(f"💵 Estimated Cost: ${total_cost:,} (@ ${avg_hourly_rate}/hour)")
        print(f"📅 Sequential Timeline: {total_hours/40:.1f} weeks (1 developer)")
        print(f"📅 Parallel Timeline: {total_hours/160:.1f} weeks (4 developers)")
        print(f"📅 Recommended Timeline: {total_hours/120:.1f} weeks (3 developers + coordination)")
        
        # Expected benefits
        print("\n📈 EXPECTED BUSINESS BENEFITS:")
        print("-" * 50)
        
        benefits = [
            ("Query Performance", "3-5x improvement", "Faster analytics, better user experience"),
            ("Analytics Capability", "Advanced ML/AI", "Predictive supply chain optimization"),
            ("Infrastructure Cost", "20-30% reduction", "Cloud-native scaling and optimization"),
            ("Time-to-Insight", "10x faster", "Natural language queries with Genie"),
            ("Scalability", "Unlimited scale", "Handle peak loads without performance issues"),
            ("Innovation Speed", "2x faster", "Rapid prototyping of new analytics")
        ]
        
        for benefit, improvement, description in benefits:
            print(f"💡 {benefit}: {improvement}")
            print(f"   {description}")
            print()
        
        # ROI calculation
        annual_savings = 200000  # Estimated annual savings
        payback_months = (total_cost / (annual_savings / 12)) if annual_savings > 0 else 0
        
        print(f"💰 ROI PROJECTION:")
        print(f"   Annual Savings Estimate: ${annual_savings:,}")
        print(f"   Payback Period: {payback_months:.1f} months")
        if total_cost > 0:
            print(f"   3-Year ROI: {((annual_savings * 3 - total_cost) / total_cost * 100):.0f}%")
        
    except Exception as e:
        print(f"❌ Error in migration planning: {e}")
        print("This might be due to data format issues, but basic planning completed.")
    
else:
    print("⚠️ Migration planning data not available")
    print("💡 Make sure your assessment report includes complexity analysis data")

## 📝 Executive Summary & Next Steps

Based on the comprehensive assessment analysis, here are the key findings and actionable next steps for GlobalSupply Corp's data modernization journey to Databricks.

In [None]:
# Generate executive summary and actionable recommendations with error handling
if 'Complexity_Analysis' in sheets_data and len(sheets_data['Complexity_Analysis']) > 0:
    df = sheets_data['Complexity_Analysis']
    
    try:
        # Calculate key metrics for summary with safe operations
        total_files = len(df)
        total_hours = df['migration_hours'].sum() if 'migration_hours' in df.columns else 0
        total_cost = total_hours * 150
        high_risk_count = len(df[df['risk_level'] == 'High']) if 'risk_level' in df.columns else 0
        avg_complexity = df['complexity_score'].mean() if 'complexity_score' in df.columns else 5.0
        total_loc = df['lines_of_code'].sum() if 'lines_of_code' in df.columns else 0
        
        print("""
═══════════════════════════════════════════════════════════════════════════════
📊 GLOBALSUPPLY CORP - EXECUTIVE SUMMARY & STRATEGIC RECOMMENDATIONS
═══════════════════════════════════════════════════════════════════════════════

🎯 ASSESSMENT FINDINGS:
""")

        # Key findings with specific numbers
        findings = [
            f"• {total_files} SQL components analyzed with {total_loc:,} lines of code",
            f"• Average complexity score: {avg_complexity:.1f}/10 (moderate-to-high complexity)",
            f"• {high_risk_count} high-risk components requiring expert attention",
            f"• Estimated migration effort: {total_hours} hours (${total_cost:,})",
            f"• Strong dependency relationships requiring careful sequencing"
        ]
        
        for finding in findings:
            print(finding)

        payback_months = (total_cost / (200000 / 12)) if total_cost > 0 else 0
        roi_3year = int(((200000 * 3 - total_cost) / total_cost * 100)) if total_cost > 0 else 0
        
        print(f"""
📈 BUSINESS IMPACT & ROI:
• Query Performance: 3-5x improvement for analytical workloads
• Natural Language Queries: Enable business users with Databricks Genie
• ML/AI Capabilities: Advanced supply chain optimization and forecasting
• Infrastructure Costs: 20-30% reduction through cloud-native optimization
• Time-to-Insight: 10x faster analytics development and deployment
• Scalability: Unlimited scale for peak demand scenarios

💰 FINANCIAL PROJECTIONS:
• Investment Required: ${total_cost:,}
• Expected Annual Savings: $200,000+
• Payback Period: {payback_months:.1f} months
• 3-Year ROI: {roi_3year}%""")

        print("""
🚀 STRATEGIC RECOMMENDATIONS:

1. ✅ IMMEDIATE ACTIONS (Next 2 weeks):
   → Secure executive sponsorship and budget approval
   → Assemble migration team with SQL Server + Databricks expertise
   → Set up Databricks workspace and development environment
   → Begin with Module 2: Schema Migration & Transpilation workshop

2. 📋 SHORT-TERM EXECUTION (4-6 weeks):
   → Execute Wave 1 migrations (low complexity, quick wins)
   → Establish CI/CD pipelines for automated testing
   → Begin user training on Databricks platform
   → Proceed to Module 3: Data Reconciliation workshop

3. 🎯 MEDIUM-TERM DELIVERY (2-3 months):
   → Complete Wave 2 & 3 migrations with thorough testing
   → Implement advanced analytics and ML models
   → Deploy natural language query capabilities
   → Complete Module 4: Modern Analytics & ML workshop

4. 🌟 LONG-TERM OPTIMIZATION (3-6 months):
   → Optimize performance and cost efficiency
   → Expand ML/AI use cases across supply chain
   → Train business users on self-service analytics
   → Plan for additional data sources and use cases

🛠️ CRITICAL SUCCESS FACTORS:
• Strong project management with clear milestones
• Dedicated team with both SQL Server and Databricks skills
• Comprehensive testing strategy including data validation
• User training and change management program
• Phased rollout with fallback procedures

📞 RECOMMENDED SUPPORT RESOURCES:
• Databricks Professional Services for complex components
• Lakebridge community and documentation
• Partner ecosystem for specialized migration expertise
• Training programs for team skill development""")

        # Risk mitigation strategies
        if high_risk_count > 0:
            print(f"""
⚠️  RISK MITIGATION FOR {high_risk_count} HIGH-RISK COMPONENTS:
• Assign senior architects to complex components
• Develop proof-of-concepts for high-risk migrations
• Plan for manual testing and validation
• Consider parallel runs during transition period
• Maintain rollback procedures for critical systems""")

        print("""
═══════════════════════════════════════════════════════════════════════════════
🎯 DECISION: PROCEED WITH DATABRICKS MIGRATION

The assessment demonstrates a strong business case for migrating GlobalSupply Corp's
data warehouse to Databricks. The combination of performance improvements, cost 
savings, and advanced analytics capabilities provides compelling ROI.

Next Workshop Module: Schema Migration & Transpilation
═══════════════════════════════════════════════════════════════════════════════
""")
    
    except Exception as e:
        print(f"❌ Error generating executive summary: {e}")
        print("This might be due to missing data columns, but assessment analysis was successful.")
    
else:
    print("⚠️ Assessment data not available for executive summary")
    print("💡 Make sure you have run the assessment analyzer first:")
    print("   python 01_assessment_analyzer.py --generate-samples")

## 📤 Export Results for Stakeholders

Generate reports for different stakeholder groups:
- **Executive Summary** - High-level findings and recommendations
- **Technical Report** - Detailed migration plan with component breakdown
- **Project Plan** - Timeline, resources, and milestone tracking

In [None]:
# Export comprehensive results for different stakeholder groups with error handling
if 'Complexity_Analysis' in sheets_data and len(sheets_data['Complexity_Analysis']) > 0:
    df = sheets_data['Complexity_Analysis'].copy()
    
    try:
        # Ensure migration wave assignments exist
        if 'Migration_Wave' not in df.columns:
            def assign_migration_wave(row):
                risk = row.get('risk_level', 'Medium')
                complexity = row.get('complexity_score', 5.0)
                
                if risk == 'Low' and complexity < 6:
                    return 'Wave 1 - Quick Wins'
                elif risk == 'Medium' or (risk == 'Low' and complexity >= 6):
                    return 'Wave 2 - Standard Migration'
                else:
                    return 'Wave 3 - Complex Components'
                    
            df['Migration_Wave'] = df.apply(assign_migration_wave, axis=1)
        
        # 1. Export detailed technical migration plan
        try:
            df.to_csv('globalsupply_detailed_migration_plan.csv', index=False)
            print("✅ Detailed technical plan exported: globalsupply_detailed_migration_plan.csv")
        except Exception as e:
            print(f"⚠️ Could not export CSV: {e}")
        
        # Calculate summary metrics safely
        total_files = len(df)
        total_hours = df['migration_hours'].sum() if 'migration_hours' in df.columns else 0
        total_cost = total_hours * 150
        high_risk_count = len(df[df['risk_level'] == 'High']) if 'risk_level' in df.columns else 0
        total_loc = df['lines_of_code'].sum() if 'lines_of_code' in df.columns else 0
        table_refs = df['table_references'].sum() if 'table_references' in df.columns else 0
        
        # 2. Create executive summary document
        exec_summary = f"""
GLOBALSUPPLY CORP - DATABRICKS MIGRATION ASSESSMENT
Executive Summary Report
Generated: {pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S')}
========================================================

PROJECT SCOPE:
• {total_files} SQL components analyzed
• {total_loc:,} total lines of code
• {table_refs} database table dependencies
• Supply chain analytics and reporting workloads

INVESTMENT REQUIRED:
• Development Effort: {total_hours} hours
• Estimated Cost: ${total_cost:,} (including team, tools, training)
• Timeline: {total_hours/120:.1f} weeks with 3-person team
• Phased Approach: 3 migration waves over 4-5 months

RISK ASSESSMENT:
• {high_risk_count} high-risk components requiring expert attention
• {len(df[df['risk_level'] == 'Medium']) if 'risk_level' in df.columns else 0} medium-risk components for standard migration
• {len(df[df['risk_level'] == 'Low']) if 'risk_level' in df.columns else 0} low-risk components for quick wins
• Comprehensive dependency mapping completed
• Mitigation strategies defined for all risk categories

BUSINESS BENEFITS:
• Performance: 3-5x improvement in query execution
• Analytics: Advanced ML/AI capabilities for supply chain optimization
• User Experience: Natural language queries with Databricks Genie
• Cost Efficiency: 20-30% reduction in infrastructure costs
• Scalability: Unlimited scale for peak demand periods
• Innovation: 2x faster development of new analytics

FINANCIAL PROJECTIONS:
• Annual Cost Savings: $200,000+ (infrastructure + productivity)
• Payback Period: {(total_cost / (200000 / 12)):.1f} months
• 3-Year Net Present Value: ${(200000 * 3 - total_cost):,}
• ROI: {((200000 * 3 - total_cost) / total_cost * 100):.0f}% over 3 years

STRATEGIC RECOMMENDATION:
PROCEED with Databricks migration using Lakebridge toolchain.
The analysis demonstrates strong business justification with
manageable technical risk and clear path to success.

KEY SUCCESS FACTORS:
• Executive sponsorship and dedicated team
• Phased migration approach starting with quick wins
• Comprehensive testing and validation procedures
• User training and change management program
• Partnership with Databricks Professional Services

NEXT STEPS:
1. Secure budget approval and team assignment
2. Set up Databricks workspace and development environment  
3. Begin Schema Migration & Transpilation workshop (Module 2)
4. Execute Wave 1 migrations within 4-6 weeks

This assessment provides the foundation for a successful
modernization that will transform GlobalSupply Corp's 
analytics capabilities and competitive advantage.
"""
        
        try:
            with open('globalsupply_executive_summary.txt', 'w') as f:
                f.write(exec_summary)
            print("✅ Executive summary exported: globalsupply_executive_summary.txt")
        except Exception as e:
            print(f"⚠️ Could not export executive summary: {e}")
        
        # 3. Create project tracking template
        if 'Migration_Wave' in df.columns:
            try:
                wave_summary = df.groupby('Migration_Wave').agg({
                    'file_name': 'count',
                    'migration_hours': 'sum' if 'migration_hours' in df.columns else lambda x: 0,
                    'complexity_score': 'mean' if 'complexity_score' in df.columns else lambda x: 5.0
                }).round(1)
                
                wave_summary.to_csv('globalsupply_project_waves.csv')
                print("✅ Project wave summary exported: globalsupply_project_waves.csv")
            except Exception as e:
                print(f"⚠️ Could not export wave summary: {e}")
        
        print("\n📊 ASSESSMENT COMPLETE - Ready for Next Phase")
        print("="*60)
        print("🎯 Deliverables Generated:")
        print("   📄 Executive Summary (for leadership team)")
        print("   📋 Detailed Migration Plan (for technical team)")
        print("   📊 Project Wave Summary (for project managers)")
        print("\n🚀 Next Workshop Module: Schema Migration & Transpilation")
        print("   Location: ../02_transpilation/")
        print("   Focus: Hands-on SQL conversion using Lakebridge")
        
    except Exception as e:
        print(f"❌ Error during export process: {e}")
        print("Assessment analysis was successful, but export encountered issues.")
        
else:
    print("⚠️ No assessment data available for export")
    print("\n📝 To generate real assessment data:")
    print("   1. Export your SQL Server workloads to files")
    print("   2. Run: python 01_assessment_analyzer.py --generate-samples")
    print("   3. Re-run this notebook with the generated Excel report")