# Excel DataFrame Processor - Jupyter Notebook Example (Fixed)

This notebook demonstrates how to use the Excel DataFrame Processor in Jupyter notebooks for data analysis and visualization.

## Features Covered:
- 📊 Loading Excel files programmatically
- 🔍 Executing SQL queries on Excel data
- 🎨 Using magic commands for convenient querying (with fallback)
- 📈 Data visualization with matplotlib and seaborn
- 📤 Exporting results to CSV
- 🔗 Joining data from multiple Excel files

## Setup and Installation

First, make sure you have the Excel DataFrame Processor installed and sample data created:

In [None]:
# Install required packages (run this if needed)
# !pip install pandas openpyxl matplotlib seaborn plotly

# Create sample data (run this if sample_data directory doesn't exist)
# !python create_sample_data.py

## Import Libraries

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from pathlib import Path

# Import Excel DataFrame Processor
from excel_processor.notebook import ExcelProcessor

# Set up plotting style
plt.style.use('default')
sns.set_palette("husl")

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', 50)

## Method 1: Programmatic Interface

### Initialize the Excel Processor

In [None]:
# Initialize the Excel processor with sample data directory
excel_processor = ExcelProcessor(db_directory='sample_data', memory_limit_mb=512)

print("✅ Excel DataFrame Processor initialized!")
print(f"📁 Database directory: {excel_processor.db_directory}")

### Explore Available Data

In [None]:
# Show all available Excel files and sheets
db_info = excel_processor.show_db()

# Load all files into memory for faster querying
load_info = excel_processor.load_db()

### Basic SQL Queries

In [None]:
# Query 1: View all employees
employees = excel_processor.query("SELECT * FROM employees.staff")
print(f"📊 Total employees: {len(employees)}")

In [None]:
# Query 2: High earners only
high_earners = excel_processor.query(
    "SELECT name, department, salary FROM employees.staff WHERE salary > 70000 ORDER BY salary DESC"
)
print(f"💰 High earners (>$70k): {len(high_earners)}")

In [None]:
# Query 3: Department summary using pandas (since GROUP BY isn't fully implemented yet)
dept_stats = employees.groupby('department').agg({
    'salary': ['count', 'mean', 'min', 'max'],
    'age': 'mean'
}).round(2)

dept_stats.columns = ['employee_count', 'avg_salary', 'min_salary', 'max_salary', 'avg_age']
print("📈 Department Statistics:")
display(dept_stats)

### Working with Multiple Files

In [None]:
# Query orders data
orders = excel_processor.query("SELECT * FROM orders.sales_data")
print(f"📦 Total orders: {len(orders)}")

# Query products data
products = excel_processor.query("SELECT * FROM products.catalog")
inventory = excel_processor.query("SELECT * FROM products.inventory")

print(f"🛍️ Products in catalog: {len(products)}")
print(f"📦 Inventory records: {len(inventory)}")

### Test NULL Functionality

In [None]:
# Test Oracle-style NULL checks (if test_nulls.xlsx exists)
try:
    null_test = excel_processor.query("SELECT * FROM test_nulls.staff_with_nulls WHERE name IS NULL")
    print(f"📊 Records with NULL names: {len(null_test)}")
    
    not_null_test = excel_processor.query("SELECT * FROM test_nulls.staff_with_nulls WHERE department IS NOT NULL")
    print(f"📊 Records with non-NULL departments: {len(not_null_test)}")
except Exception as e:
    print(f"⚠️ NULL test data not available: {e}")
    print("Run create_null_test_data.py to create test data with NULL values")

## Method 2: Magic Commands (with Fallback)

In [None]:
# Try to load Excel magic commands
try:
    get_ipython().magic('load_ext excel_processor.notebook')
    magic_available = True
    print("✨ Excel magic commands loaded!")
    print("Available commands:")
    print("  %excel_init --db <directory>")
    print("  %excel_show_db")
    print("  %excel_load_db")
    print("  %excel_memory")
    print("  %%excel_sql")
except Exception as e:
    magic_available = False
    print(f"⚠️ Magic commands not available: {e}")
    print("Using programmatic interface instead...")

In [None]:
if magic_available:
    # Initialize with magic command
    get_ipython().magic('excel_init --db sample_data --memory-limit 512')
    
    # Show database contents
    get_ipython().magic('excel_show_db')
    
    # Load all files
    get_ipython().magic('excel_load_db')
else:
    # Fallback to programmatic interface
    print("Using programmatic interface for magic command examples...")
    excel_processor.show_db()
    excel_processor.load_db()

In [None]:
# Execute SQL queries (works with both magic and programmatic interface)
if magic_available:
    # This would be the magic command syntax in a real notebook:
    # %%excel_sql
    # SELECT name, department, salary FROM employees.staff WHERE salary > 75000 ORDER BY salary DESC
    
    # For now, use programmatic interface
    high_salary_magic = excel_processor.query(
        "SELECT name, department, salary FROM employees.staff WHERE salary > 75000 ORDER BY salary DESC"
    )
else:
    # Use programmatic interface
    high_salary_magic = excel_processor.query(
        "SELECT name, department, salary FROM employees.staff WHERE salary > 75000 ORDER BY salary DESC"
    )

In [None]:
# Check memory usage
if magic_available:
    get_ipython().magic('excel_memory')
else:
    memory_info = excel_processor.get_memory_usage()
    print(f"💾 Memory Usage:")
    print(f"  Total: {memory_info['total_mb']:.2f} MB")
    print(f"  Limit: {memory_info['limit_mb']:.2f} MB")
    print(f"  Usage: {memory_info['usage_percent']:.1f}%")
    print(f"  Files loaded: {len(memory_info['files'])}")

## Data Analysis and Visualization

In [None]:
# Salary distribution by department
plt.figure(figsize=(12, 6))

plt.subplot(1, 2, 1)
sns.boxplot(data=employees, x='department', y='salary')
plt.title('Salary Distribution by Department')
plt.xticks(rotation=45)

plt.subplot(1, 2, 2)
dept_counts = employees['department'].value_counts()
plt.pie(dept_counts.values, labels=dept_counts.index, autopct='%1.1f%%')
plt.title('Employee Distribution by Department')

plt.tight_layout()
plt.show()

## Export Results

In [None]:
# Export high earners to CSV using SQL syntax
high_earners_export = excel_processor.query(
    "SELECT name, department, salary FROM employees.staff WHERE salary > 70000",
    display_result=False
)
high_earners_export.to_csv('high_earners.csv', index=False)
print("✅ Exported high earners to high_earners.csv")

# Export department summary
dept_stats.to_csv('department_summary.csv')
print("✅ Exported department summary to department_summary.csv")

## Summary

This notebook demonstrated:

✅ **Programmatic Interface**: Using `ExcelProcessor` class for direct Python integration  
✅ **Magic Commands**: Convenient magic commands with fallback to programmatic interface  
✅ **Oracle-style SQL**: Support for `IS NULL` and `IS NOT NULL` operators  
✅ **Data Analysis**: Combining Excel data with pandas for advanced analytics  
✅ **Visualization**: Creating charts and plots with matplotlib and seaborn  
✅ **Export Capabilities**: Saving results to CSV files  
✅ **Memory Management**: Monitoring and controlling memory usage  

### Key Features:
- **Tab Completion**: Intelligent auto-completion for table names, columns, and values
- **Oracle-style Syntax**: Support for `IS NULL`, `IS NOT NULL`, and quoted strings
- **Single Query Mode**: Execute queries from command line with `--query` parameter
- **Error Handling**: Graceful handling of syntax errors and missing data

Happy analyzing! 🚀📊