# sPyTial for Pandas: Structural Understanding of DataFrames

**Why sPyTial beats traditional pandas visualization approaches**

Traditional pandas visualization focuses on *statistical* patterns in your data. sPyTial focuses on the *structural* relationships - showing how your DataFrame is organized, how columns relate to each other, and revealing hidden data architecture.

## The Problem with Traditional Approaches

When working with pandas DataFrames, most visualizations show you:
- **What the data looks like** (histograms, scatter plots)
- **Statistical relationships** (correlations, distributions)

But they don't show you:
- **How your data is structured** 
- **Relationships between DataFrame components**
- **Data architecture patterns**

sPyTial fills this gap by treating DataFrames as spatial objects with meaningful relationships.

In [None]:
import sys
from pathlib import Path

# Add the parent directory to the Python path
sys.path.append(str(Path().resolve().parent))

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from spytial import diagram, orientation, group, attribute, atomColor

## Demo 1: DataFrame Structure Visualization

Let's start with a real-world dataset and compare traditional vs sPyTial approaches.

In [None]:
# Create a realistic employee dataset
np.random.seed(42)
n_employees = 100

departments = ['Engineering', 'Sales', 'Marketing', 'HR', 'Finance']
levels = ['Junior', 'Mid', 'Senior', 'Lead', 'Director']

df = pd.DataFrame({
    'employee_id': range(1001, 1001 + n_employees),
    'name': [f'Employee_{i}' for i in range(n_employees)],
    'department': np.random.choice(departments, n_employees),
    'level': np.random.choice(levels, n_employees),
    'salary': np.random.normal(75000, 25000, n_employees).astype(int),
    'years_experience': np.random.randint(0, 20, n_employees),
    'performance_score': np.random.uniform(1, 5, n_employees).round(2)
})

# Add some realistic salary adjustments based on level
level_multipliers = {'Junior': 0.8, 'Mid': 1.0, 'Senior': 1.3, 'Lead': 1.6, 'Director': 2.0}
df['salary'] = df.apply(lambda row: int(row['salary'] * level_multipliers[row['level']]), axis=1)

print("Traditional pandas DataFrame view:")
print(df.head())
print(f"\nShape: {df.shape}")
print(f"\nColumns: {list(df.columns)}")

### Traditional Visualization: Shows Data Patterns

In [None]:
# Traditional approach: Statistical visualization
fig, axes = plt.subplots(2, 2, figsize=(12, 8))
fig.suptitle('Traditional Pandas Visualization: Data Patterns', fontsize=16)

# Salary distribution
df['salary'].hist(bins=20, ax=axes[0,0])
axes[0,0].set_title('Salary Distribution')
axes[0,0].set_xlabel('Salary')

# Department counts
df['department'].value_counts().plot(kind='bar', ax=axes[0,1])
axes[0,1].set_title('Employees by Department')
axes[0,1].tick_params(axis='x', rotation=45)

# Salary vs Experience
axes[1,0].scatter(df['years_experience'], df['salary'], alpha=0.6)
axes[1,0].set_title('Salary vs Experience')
axes[1,0].set_xlabel('Years Experience')
axes[1,0].set_ylabel('Salary')

# Performance by Level
df.boxplot(column='performance_score', by='level', ax=axes[1,1])
axes[1,1].set_title('Performance by Level')

plt.tight_layout()
plt.show()

print("👆 Traditional approach tells us WHAT the data looks like")
print("But it doesn't show us HOW the data is structured!")

### sPyTial Approach: Shows Data Structure

Now let's use sPyTial to visualize the *structural* aspects of our DataFrame.

In [None]:
print("sPyTial DataFrame Visualization: Structure & Relationships")
print("="*60)

# sPyTial shows the DataFrame as a structured object
diagram(df, method="inline")

print("\n👆 sPyTial shows HOW your DataFrame is organized:")
print("- The spatial arrangement of columns")
print("- Relationships between DataFrame components")
print("- The actual structure of your data object")

## Demo 2: Multi-DataFrame Analysis

When working with multiple related DataFrames, sPyTial shines by showing structural relationships that traditional tools miss.

In [None]:
# Create related DataFrames - a common real-world scenario

# Projects DataFrame
projects_df = pd.DataFrame({
    'project_id': ['P001', 'P002', 'P003', 'P004', 'P005'],
    'project_name': ['WebApp Redesign', 'Mobile App', 'Data Migration', 'API Development', 'Security Audit'],
    'budget': [50000, 75000, 30000, 60000, 40000],
    'department': ['Engineering', 'Engineering', 'Engineering', 'Engineering', 'Finance']
})

# Project assignments DataFrame  
assignments_df = pd.DataFrame({
    'assignment_id': range(1, 16),
    'employee_id': np.random.choice(df['employee_id'].head(20), 15),  # Random assignments
    'project_id': np.random.choice(projects_df['project_id'], 15),
    'role': np.random.choice(['Developer', 'Designer', 'Manager', 'Analyst'], 15),
    'allocation_pct': np.random.choice([25, 50, 75, 100], 15)
})

# Create a data analysis workspace
analysis_workspace = {
    'employees': df.head(10),  # Subset for clarity
    'projects': projects_df,
    'assignments': assignments_df,
    'summary_stats': {
        'total_employees': len(df),
        'total_projects': len(projects_df),
        'avg_salary': df['salary'].mean(),
        'departments': df['department'].unique().tolist()
    }
}

print("Multi-DataFrame Analysis Workspace:")
print(f"- {len(analysis_workspace)} main components")
print(f"- {len(analysis_workspace['employees'])} employees (subset)")
print(f"- {len(analysis_workspace['projects'])} projects")
print(f"- {len(analysis_workspace['assignments'])} assignments")

### Traditional Approach: Multiple Separate Views

In [None]:
# Traditional approach: Show each DataFrame separately
print("Traditional approach: Separate DataFrame views")
print("="*50)

print("\n📊 Projects DataFrame:")
print(projects_df)

print("\n📊 Assignments DataFrame:")
print(assignments_df.head())

print("\n👎 Problems with traditional approach:")
print("- Can't see relationships between DataFrames")
print("- No unified view of data architecture") 
print("- Hard to understand overall data structure")
print("- Each view is isolated")

### sPyTial Approach: Unified Structural View

In [None]:
# sPyTial shows the relationships between all components
print("sPyTial: Unified Analysis Workspace Structure")
print("="*50)

diagram(analysis_workspace, method="inline")

print("\n👍 sPyTial advantages:")
print("- Shows relationships between DataFrames")
print("- Reveals data architecture patterns")
print("- Unified view of entire workspace")
print("- Spatial organization makes structure clear")

## Demo 3: Enhanced DataFrame with Annotations

sPyTial's real power comes from spatial annotations that add semantic meaning to your data structure.

In [None]:
# Create an annotated version of our analysis workspace
# Group related components and add visual cues

annotated_workspace = group(field='data_tables', groupOn='structure_type')(
    atomColor(selector='high_value', value='green')(
        {
            'data_tables': {
                'employees': df.head(5),
                'projects': projects_df,
                'assignments': assignments_df
            },
            'metadata': {
                'data_quality': 'high',
                'last_updated': '2024-01-15',
                'schema_version': '2.1'
            },
            'analytics': {
                'total_budget': projects_df['budget'].sum(),
                'avg_allocation': assignments_df['allocation_pct'].mean(),
                'utilization_rate': 0.85
            }
        }
    )
)

print("Enhanced sPyTial Visualization with Annotations:")
diagram(annotated_workspace, method="inline")

print("\n✨ Enhanced features:")
print("- Grouped related components (data_tables)")
print("- Color coding for important elements")
print("- Clear separation of data vs metadata vs analytics")
print("- Spatial organization reflects logical structure")

## Demo 4: Complex DataFrame Relationships

For complex data analysis scenarios, sPyTial helps visualize intricate relationships that would be impossible to understand with traditional tools.

In [None]:
# Create a complex data analysis pipeline
pipeline_data = {
    'raw_data': {
        'source_employees': df,
        'source_projects': projects_df
    },
    'processed_data': {
        'employee_summary': df.groupby('department')['salary'].agg(['mean', 'count']).reset_index(),
        'project_budget_analysis': projects_df.groupby('department')['budget'].sum().reset_index()
    },
    'results': {
        'department_efficiency': pd.merge(
            df.groupby('department')['performance_score'].mean().reset_index(),
            projects_df.groupby('department')['budget'].sum().reset_index(),
            on='department'
        ),
        'insights': {
            'highest_paid_dept': df.groupby('department')['salary'].mean().idxmax(),
            'most_projects_dept': projects_df['department'].value_counts().index[0],
            'correlation_perf_budget': 0.67  # Simulated correlation
        }
    }
}

print("Complex Data Pipeline Structure:")
diagram(pipeline_data, method="inline")

print("\n🔍 What sPyTial reveals:")
print("- Data flow from raw → processed → results")
print("- Relationships between different analysis stages")
print("- Structure of complex analytical workflows")
print("- How DataFrames transform through the pipeline")

## Summary: Why sPyTial Beats Traditional Pandas Visualization

| Traditional Approach | sPyTial Approach |
|---------------------|------------------|
| Shows **what** data looks like | Shows **how** data is structured |
| Statistical patterns | Architectural patterns |
| Isolated views | Unified structural views |
| Data content focus | Data relationship focus |
| Good for analysis | Good for understanding |

### When to Use sPyTial:

✅ **Understanding DataFrame structure**  
✅ **Debugging complex data pipelines**  
✅ **Documenting data architecture**  
✅ **Teaching pandas concepts**  
✅ **Exploring multi-DataFrame relationships**  
✅ **Code reviews involving data structures**  

### When to Use Traditional Tools:

📊 **Statistical analysis**  
📊 **Pattern discovery in data values**  
📊 **Publication-ready charts**  
📊 **Time series analysis**  

**The key insight**: sPyTial and traditional visualization are *complementary*. Use traditional tools to understand your data values, use sPyTial to understand your data structure.