# 📊 Data Visualization Practice: Grammar of Graphics and Practical EDA

## Learning Objectives
- Understand and practice Grammar of Graphics concepts
- Master various chart types and their applications
- Perform EDA with real data
- Implement interactive visualizations
- Create dashboard-style comprehensive visualizations

## Practice Structure (Total 10 Labs)
### Part A: Grammar of Graphics Fundamentals
1. Core Concepts of Grammar of Graphics
2. Scales and Coordinate Transformations
3. Faceting and Small Multiples

### Part B: Practical EDA Visualizations
4. Univariate Analysis - Exploring Distributions
5. Bivariate Relationships - Discovering Correlations
6. Categorical Data - Comparisons and Compositions
7. Time Series Analysis - Trends and Patterns

### Part C: Advanced Visualization Techniques
8. Interactive Visualizations - Using Plotly
9. Geographic Data Visualization
10. Dashboard-Style Comprehensive Visualizations

---

## 0. Environment Setup and Library Imports

In [None]:
# Install required libraries (if needed)
# !pip install pandas numpy matplotlib seaborn plotly scipy scikit-learn

# Basic libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import warnings
warnings.filterwarnings('ignore')

# Visualization style settings
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('husl')
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 12
plt.rcParams['axes.unicode_minus'] = False

print('✅ Libraries loaded successfully!')
print(f'Pandas version: {pd.__version__}')
print(f'NumPy version: {np.__version__}')
print(f'Seaborn version: {sns.__version__}')

---
# Part A: Grammar of Graphics Fundamentals

Grammar of Graphics is a visualization theory proposed by Leland Wilkinson that approaches visualization systematically by breaking it down into components.

## Lab 1: Core Concepts of Grammar of Graphics 🎨

### 📚 Concept Explanation
Key components of Grammar of Graphics:
- **Data**: The data to be visualized
- **Aesthetics (aes)**: Mapping data to visual properties (x, y, color, size, etc.)
- **Geometries (geom)**: Geometric objects (points, lines, bars, etc.)
- **Scales**: Transform data values to visual values
- **Facets**: Data partitioning and conditional plots
- **Themes**: Visual styling

In [None]:
# Prepare data
iris = sns.load_dataset('iris')
print("📊 Iris Dataset Structure:")
print(iris.head())
print(f"\nData size: {iris.shape}")
print(f"Species types: {iris['species'].unique()}")

In [None]:
# Building layers in Grammar of Graphics style
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# 1. Data + Aesthetics only
ax1 = axes[0]
ax1.set_xlim(iris['sepal_length'].min()-0.5, iris['sepal_length'].max()+0.5)
ax1.set_ylim(iris['sepal_width'].min()-0.5, iris['sepal_width'].max()+0.5)
ax1.set_xlabel('Sepal Length')
ax1.set_ylabel('Sepal Width')
ax1.set_title('1. Data + Aesthetics (axes only)')
ax1.grid(True, alpha=0.3)

# 2. + Geometry (add points)
ax2 = axes[1]
ax2.scatter(iris['sepal_length'], iris['sepal_width'], alpha=0.6)
ax2.set_xlabel('Sepal Length')
ax2.set_ylabel('Sepal Width')
ax2.set_title('2. + Geometry (add points)')
ax2.grid(True, alpha=0.3)

# 3. + Color Aesthetic (color by species)
ax3 = axes[2]
for species in iris['species'].unique():
    data = iris[iris['species'] == species]
    ax3.scatter(data['sepal_length'], data['sepal_width'], 
               label=species, alpha=0.6, s=50)
ax3.set_xlabel('Sepal Length')
ax3.set_ylabel('Sepal Width')
ax3.set_title('3. + Color Aesthetic (by species)')
ax3.legend()
ax3.grid(True, alpha=0.3)

plt.suptitle('Grammar of Graphics: Layer-by-Layer Construction', fontsize=16, y=1.02)
plt.tight_layout()
plt.show()

print("💡 Analysis Points:")
print("- Visualization becomes richer by adding layers one by one")
print("- Species clusters become clearly visible when color aesthetic is added")

## Lab 2: Scales and Coordinate Transformations 📐

### 📚 Concept Explanation
- **Scale Transformations**: Make data patterns more visible through linear, log, square root, and other transformations
- **Coordinate Systems**: Coordinate system transformations such as Cartesian and polar coordinates

In [None]:
# Generate exponential distribution data
np.random.seed(42)
exponential_data = pd.DataFrame({
    'x': np.linspace(1, 100, 100),
    'y': np.exp(np.linspace(0, 5, 100)) + np.random.normal(0, 50, 100),
    'category': np.repeat(['A', 'B', 'C', 'D', 'E'], 20)
})

fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# 1. Linear scale
axes[0,0].scatter(exponential_data['x'], exponential_data['y'], alpha=0.6)
axes[0,0].set_title('Linear Scale')
axes[0,0].set_xlabel('X')
axes[0,0].set_ylabel('Y')
axes[0,0].grid(True, alpha=0.3)

# 2. Y-axis log scale
axes[0,1].scatter(exponential_data['x'], exponential_data['y'], alpha=0.6, color='orange')
axes[0,1].set_yscale('log')
axes[0,1].set_title('Log Scale Y-axis')
axes[0,1].set_xlabel('X')
axes[0,1].set_ylabel('Y (log scale)')
axes[0,1].grid(True, alpha=0.3)

# 3. Both axes log scale
axes[1,0].scatter(exponential_data['x'], np.abs(exponential_data['y']), alpha=0.6, color='green')
axes[1,0].set_xscale('log')
axes[1,0].set_yscale('log')
axes[1,0].set_title('Log-Log Scale')
axes[1,0].set_xlabel('X (log scale)')
axes[1,0].set_ylabel('Y (log scale)')
axes[1,0].grid(True, alpha=0.3)

# 4. Square root scale
axes[1,1].scatter(exponential_data['x'], np.sqrt(np.abs(exponential_data['y'])), 
                 alpha=0.6, color='red')
axes[1,1].set_title('Square Root Transform')
axes[1,1].set_xlabel('X')
axes[1,1].set_ylabel('sqrt(Y)')
axes[1,1].grid(True, alpha=0.3)

plt.suptitle('Effects of Scale Transformations', fontsize=16)
plt.tight_layout()
plt.show()

print("💡 When to use scale transformations:")
print("- Log scale: Exponential growth/decay patterns, data spanning multiple orders of magnitude")
print("- Square root transform: When variance is proportional to the mean")
print("- Log-log: Identifying power law relationships")

## Lab 3: Faceting and Small Multiples 🔲

### 📚 Concept Explanation
Faceting is a technique of dividing one dataset into multiple subsets and displaying each in a separate panel.
- **facet_wrap**: Arrange panels in a 1D array
- **facet_grid**: Arrange panels in a 2D grid

In [None]:
# Faceting with Seaborn
tips = sns.load_dataset('tips')

print("🍽️ Tips Dataset:")
print(tips.head())
print(f"\nColumns: {tips.columns.tolist()}")
print(f"Data size: {tips.shape}")

In [None]:
# Facet Grid: Visualize by day and time
g = sns.FacetGrid(tips, col='time', row='day', height=3, aspect=1.2)
g.map(sns.scatterplot, 'total_bill', 'tip', alpha=0.6)
g.add_legend()
g.fig.suptitle('Tips by Day and Time - Facet Grid', y=1.02, fontsize=14)
plt.tight_layout()
plt.show()

print("\n💡 Faceting advantages:")
print("- Easy comparison across multiple conditions")
print("- Identify patterns in each subgroup")
print("- Prevent overplotting")

In [None]:
# FacetGrid with different plots
g = sns.FacetGrid(tips, col='day', col_wrap=2, height=4)
g.map_dataframe(sns.histplot, x='total_bill', bins=15, kde=True)
g.set_axis_labels('Total Bill', 'Count')
g.fig.suptitle('Total Bill Distribution by Day of Week', y=1.02, fontsize=14)
plt.tight_layout()
plt.show()

print("\n💡 Each day's distribution shows:")
for day in tips['day'].unique():
    avg_bill = tips[tips['day']==day]['total_bill'].mean()
    print(f"- {day}: Average bill ${avg_bill:.2f}")

---
# Part B: Practical EDA Visualizations

Exploratory Data Analysis (EDA) is the process of discovering patterns, anomalies, and insights in data through visualization.

## Lab 4: Univariate Analysis - Exploring Distributions 📈

### 📚 Concept Explanation
Univariate analysis examines the characteristics of a single variable:
- **Distribution shape**: Normal, skewed, bimodal
- **Central tendency**: Mean, median, mode
- **Spread**: Range, variance, standard deviation
- **Outliers**: Extreme values

In [None]:
# Generate various distribution data
np.random.seed(42)
dist_data = pd.DataFrame({
    'normal': np.random.normal(100, 15, 1000),
    'skewed': np.random.exponential(2, 1000),
    'bimodal': np.concatenate([np.random.normal(50, 10, 500), 
                               np.random.normal(80, 10, 500)]),
    'uniform': np.random.uniform(0, 100, 1000)
})

print("📊 Distribution Data Statistics:")
print(dist_data.describe())

In [None]:
# Distribution visualization: Multiple approaches
fig, axes = plt.subplots(2, 4, figsize=(16, 8))

for idx, col in enumerate(dist_data.columns):
    # Histogram + KDE
    axes[0, idx].hist(dist_data[col], bins=30, alpha=0.7, edgecolor='black')
    axes[0, idx].set_title(f'{col.capitalize()} - Histogram')
    axes[0, idx].set_xlabel('Value')
    axes[0, idx].set_ylabel('Frequency')
    
    # Box plot
    axes[1, idx].boxplot(dist_data[col], vert=True)
    axes[1, idx].set_title(f'{col.capitalize()} - Box Plot')
    axes[1, idx].set_ylabel('Value')
    axes[1, idx].grid(axis='y', alpha=0.3)

plt.suptitle('Univariate Distribution Analysis - Multiple Views', fontsize=16, y=1.00)
plt.tight_layout()
plt.show()

print("\n💡 Distribution characteristics:")
print("- Normal: Symmetric bell curve")
print("- Skewed: Long tail on one side")
print("- Bimodal: Two distinct peaks")
print("- Uniform: Evenly distributed")

In [None]:
# Violin plot: Combine box plot and distribution
fig, ax = plt.subplots(figsize=(12, 6))
dist_data_melted = dist_data.melt(var_name='Distribution', value_name='Value')
sns.violinplot(data=dist_data_melted, x='Distribution', y='Value', ax=ax)
ax.set_title('Distribution Comparison - Violin Plot', fontsize=14)
ax.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

print("\n💡 Violin plot shows:")
print("- Width represents frequency at each value")
print("- Box plot information (quartiles) included")
print("- Easy to compare multiple distributions")

## Lab 5: Bivariate Relationships - Discovering Correlations 🔗

### 📚 Concept Explanation
Bivariate analysis examines the relationship between two variables:
- **Correlation**: Strength and direction of linear relationships
- **Patterns**: Linear, nonlinear, clustered
- **Dependencies**: How one variable affects another

In [None]:
# Generate bivariate relationship data
np.random.seed(42)
n = 200

bivar_data = pd.DataFrame({
    'x': np.linspace(0, 10, n),
    'linear_pos': np.linspace(0, 10, n) * 2 + np.random.normal(0, 2, n),
    'linear_neg': -np.linspace(0, 10, n) * 1.5 + np.random.normal(0, 2, n),
    'quadratic': (np.linspace(0, 10, n) - 5)**2 + np.random.normal(0, 3, n),
    'no_correlation': np.random.normal(0, 5, n)
})

print("📊 Bivariate Data Correlations:")
print(bivar_data.corr()[['linear_pos', 'linear_neg', 'quadratic', 'no_correlation']].loc['x'])

In [None]:
# Scatter plot matrix
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# Positive correlation
axes[0,0].scatter(bivar_data['x'], bivar_data['linear_pos'], alpha=0.6, color='blue')
axes[0,0].set_title('Positive Linear Correlation')
axes[0,0].set_xlabel('X')
axes[0,0].set_ylabel('Y')
axes[0,0].grid(True, alpha=0.3)
r_pos = bivar_data[['x', 'linear_pos']].corr().iloc[0,1]
axes[0,0].text(0.05, 0.95, f'r = {r_pos:.3f}', transform=axes[0,0].transAxes, 
              verticalalignment='top', fontsize=12, bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))

# Negative correlation
axes[0,1].scatter(bivar_data['x'], bivar_data['linear_neg'], alpha=0.6, color='red')
axes[0,1].set_title('Negative Linear Correlation')
axes[0,1].set_xlabel('X')
axes[0,1].set_ylabel('Y')
axes[0,1].grid(True, alpha=0.3)
r_neg = bivar_data[['x', 'linear_neg']].corr().iloc[0,1]
axes[0,1].text(0.05, 0.95, f'r = {r_neg:.3f}', transform=axes[0,1].transAxes,
              verticalalignment='top', fontsize=12, bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))

# Nonlinear relationship
axes[1,0].scatter(bivar_data['x'], bivar_data['quadratic'], alpha=0.6, color='green')
axes[1,0].set_title('Nonlinear Relationship (Quadratic)')
axes[1,0].set_xlabel('X')
axes[1,0].set_ylabel('Y')
axes[1,0].grid(True, alpha=0.3)
r_quad = bivar_data[['x', 'quadratic']].corr().iloc[0,1]
axes[1,0].text(0.05, 0.95, f'r = {r_quad:.3f}\n(Linear correlation low)', 
              transform=axes[1,0].transAxes, verticalalignment='top', fontsize=12,
              bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))

# No correlation
axes[1,1].scatter(bivar_data['x'], bivar_data['no_correlation'], alpha=0.6, color='purple')
axes[1,1].set_title('No Correlation')
axes[1,1].set_xlabel('X')
axes[1,1].set_ylabel('Y')
axes[1,1].grid(True, alpha=0.3)
r_no = bivar_data[['x', 'no_correlation']].corr().iloc[0,1]
axes[1,1].text(0.05, 0.95, f'r = {r_no:.3f}', transform=axes[1,1].transAxes,
              verticalalignment='top', fontsize=12, bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))

plt.suptitle('Types of Bivariate Relationships', fontsize=16)
plt.tight_layout()
plt.show()

print("\n💡 Correlation coefficient interpretation:")
print("- r = 1: Perfect positive correlation")
print("- r = 0: No linear correlation")
print("- r = -1: Perfect negative correlation")
print("- Note: Correlation only measures LINEAR relationships!")

In [None]:
# Correlation heatmap
iris_numeric = iris.select_dtypes(include=[np.number])
corr_matrix = iris_numeric.corr()

fig, ax = plt.subplots(figsize=(8, 6))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', center=0, 
            square=True, linewidths=1, cbar_kws={"shrink": 0.8}, ax=ax)
ax.set_title('Iris Dataset - Correlation Matrix', fontsize=14, pad=20)
plt.tight_layout()
plt.show()

print("\n💡 Strong correlations in Iris dataset:")
for i in range(len(corr_matrix.columns)):
    for j in range(i+1, len(corr_matrix.columns)):
        if abs(corr_matrix.iloc[i,j]) > 0.8:
            print(f"- {corr_matrix.columns[i]} ↔ {corr_matrix.columns[j]}: {corr_matrix.iloc[i,j]:.3f}")

## Lab 6: Categorical Data - Comparisons and Compositions 📊

### 📚 Concept Explanation
Categorical data visualization focuses on:
- **Comparisons**: Comparing values across groups
- **Distributions**: Distribution within each category
- **Compositions**: Proportions and parts-to-whole relationships

In [None]:
# Use Tips dataset for categorical analysis
print("📊 Categorical Variables in Tips Dataset:")
print(f"- day: {tips['day'].unique()}")
print(f"- time: {tips['time'].unique()}")
print(f"- sex: {tips['sex'].unique()}")
print(f"- smoker: {tips['smoker'].unique()}")
print(f"\nCategory counts:")
print(tips[['day', 'time', 'sex', 'smoker']].describe())

In [None]:
# Bar charts: Compare categories
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# 1. Average tip by day
day_avg = tips.groupby('day')['tip'].mean().sort_values()
axes[0,0].barh(day_avg.index, day_avg.values, color='skyblue', edgecolor='black')
axes[0,0].set_xlabel('Average Tip ($)')
axes[0,0].set_title('Average Tip by Day of Week')
axes[0,0].grid(axis='x', alpha=0.3)

# 2. Number of customers by time and smoker status
time_smoker = tips.groupby(['time', 'smoker']).size().unstack()
time_smoker.plot(kind='bar', ax=axes[0,1], color=['salmon', 'lightgreen'])
axes[0,1].set_title('Number of Customers by Time and Smoker Status')
axes[0,1].set_xlabel('Time')
axes[0,1].set_ylabel('Count')
axes[0,1].legend(title='Smoker')
axes[0,1].tick_params(axis='x', rotation=0)

# 3. Box plot: Tip distribution by day
sns.boxplot(data=tips, x='day', y='tip', ax=axes[1,0], palette='Set2')
axes[1,0].set_title('Tip Distribution by Day')
axes[1,0].set_ylabel('Tip ($)')
axes[1,0].grid(axis='y', alpha=0.3)

# 4. Stacked bar chart: Gender ratio by time
sex_time = pd.crosstab(tips['time'], tips['sex'], normalize='index') * 100
sex_time.plot(kind='bar', stacked=True, ax=axes[1,1], color=['#FF9999', '#66B2FF'])
axes[1,1].set_title('Gender Ratio by Time (%)')
axes[1,1].set_ylabel('Percentage')
axes[1,1].set_xlabel('Time')
axes[1,1].legend(title='Gender')
axes[1,1].tick_params(axis='x', rotation=0)

plt.suptitle('Categorical Data Analysis - Multiple Perspectives', fontsize=16, y=1.00)
plt.tight_layout()
plt.show()

print("\n💡 Key insights:")
print(f"- Highest average tip day: {day_avg.idxmax()} (${day_avg.max():.2f})")
print(f"- Total dinner customers: {tips[tips['time']=='Dinner'].shape[0]}")
print(f"- Total lunch customers: {tips[tips['time']=='Lunch'].shape[0]}")

## Lab 7: Time Series Analysis - Trends and Patterns 📅

### 📚 Concept Explanation
Time series analysis identifies:
- **Trend**: Long-term increase or decrease
- **Seasonality**: Regular periodic patterns
- **Volatility**: Degree of variation
- **Anomalies**: Unusual deviations

In [None]:
# Generate time series data
np.random.seed(42)
dates = pd.date_range('2020-01-01', periods=365*3, freq='D')

# Components
trend = np.linspace(100, 300, len(dates))
seasonality = 50 * np.sin(np.arange(len(dates)) * 2 * np.pi / 365)
noise = np.random.normal(0, 10, len(dates))

ts_data = pd.DataFrame({
    'date': dates,
    'value': trend + seasonality + noise,
    'trend': trend,
    'seasonality': seasonality
})

print("📅 Time Series Data:")
print(ts_data.head())
print(f"\nDate range: {ts_data['date'].min()} to {ts_data['date'].max()}")
print(f"Data points: {len(ts_data)}")

In [None]:
# Time series decomposition visualization
fig, axes = plt.subplots(4, 1, figsize=(14, 10), sharex=True)

# 1. Original series
axes[0].plot(ts_data['date'], ts_data['value'], linewidth=1, alpha=0.8)
axes[0].set_title('Original Time Series', fontsize=12)
axes[0].set_ylabel('Value')
axes[0].grid(True, alpha=0.3)

# 2. Trend
axes[1].plot(ts_data['date'], ts_data['trend'], color='red', linewidth=2)
axes[1].set_title('Trend Component', fontsize=12)
axes[1].set_ylabel('Trend')
axes[1].grid(True, alpha=0.3)

# 3. Seasonality
axes[2].plot(ts_data['date'], ts_data['seasonality'], color='green', linewidth=1)
axes[2].set_title('Seasonal Component', fontsize=12)
axes[2].set_ylabel('Seasonality')
axes[2].grid(True, alpha=0.3)

# 4. Residuals
residuals = ts_data['value'] - ts_data['trend'] - ts_data['seasonality']
axes[3].plot(ts_data['date'], residuals, color='gray', linewidth=0.5, alpha=0.7)
axes[3].axhline(y=0, color='black', linestyle='--', linewidth=1)
axes[3].set_title('Residual Component (Noise)', fontsize=12)
axes[3].set_ylabel('Residuals')
axes[3].set_xlabel('Date')
axes[3].grid(True, alpha=0.3)

plt.suptitle('Time Series Decomposition', fontsize=16, y=0.995)
plt.tight_layout()
plt.show()

print("\n💡 Time series components:")
print(f"- Trend: Long-term direction ({ts_data['trend'].iloc[0]:.1f} → {ts_data['trend'].iloc[-1]:.1f})")
print(f"- Seasonality: Yearly cycle (amplitude: {ts_data['seasonality'].max():.1f})")
print(f"- Residuals: Random variation (std: {residuals.std():.2f})")

In [None]:
# Rolling statistics
ts_data['MA_30'] = ts_data['value'].rolling(window=30).mean()
ts_data['MA_90'] = ts_data['value'].rolling(window=90).mean()

fig, ax = plt.subplots(figsize=(14, 6))
ax.plot(ts_data['date'], ts_data['value'], label='Original', alpha=0.4, linewidth=1)
ax.plot(ts_data['date'], ts_data['MA_30'], label='30-day MA', linewidth=2)
ax.plot(ts_data['date'], ts_data['MA_90'], label='90-day MA', linewidth=2)
ax.set_title('Time Series with Moving Averages', fontsize=14)
ax.set_xlabel('Date')
ax.set_ylabel('Value')
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("\n💡 Moving averages:")
print("- Smooth short-term fluctuations")
print("- Highlight long-term trends")
print("- Useful for forecasting")

---
# Part C: Advanced Visualization Techniques

Advanced visualization techniques enable interactive exploration and complex data representation.

## Lab 8: Interactive Visualizations - Using Plotly 🖱️

### 📚 Concept Explanation
Interactive visualizations allow:
- **Zooming and panning**: Detailed exploration of specific areas
- **Hover information**: Display detailed data on mouse over
- **Filtering**: Toggle data series on/off
- **Animation**: Visualize changes over time

In [None]:
# Interactive scatter plot
fig_scatter = px.scatter(iris, 
                        x='sepal_length', 
                        y='sepal_width',
                        color='species',
                        size='petal_length',
                        hover_data=['petal_width'],
                        title='Interactive Iris Dataset Exploration',
                        labels={'sepal_length': 'Sepal Length (cm)',
                               'sepal_width': 'Sepal Width (cm)',
                               'petal_length': 'Petal Length (cm)'})

fig_scatter.update_layout(
    width=900,
    height=600,
    hovermode='closest'
)

fig_scatter.show()

print("\n💡 Interactive features:")
print("• Hover over points for detailed information")
print("• Click legend items to toggle species on/off")
print("• Zoom by clicking and dragging")
print("• Pan by clicking and holding")

In [None]:
# Animated time series
ts_monthly = ts_data.copy()
ts_monthly['year'] = ts_monthly['date'].dt.year
ts_monthly['month'] = ts_monthly['date'].dt.month
ts_monthly_agg = ts_monthly.groupby(['year', 'month'])['value'].mean().reset_index()
ts_monthly_agg['date'] = pd.to_datetime(ts_monthly_agg[['year', 'month']].assign(day=1))

fig_animated = px.line(ts_monthly_agg, 
                      x='date', 
                      y='value',
                      title='Monthly Average Time Series',
                      labels={'value': 'Average Value', 'date': 'Date'})

fig_animated.update_traces(mode='lines+markers')
fig_animated.update_layout(
    width=900,
    height=500,
    xaxis_rangeslider_visible=True
)

fig_animated.show()

print("\n💡 Time series features:")
print("• Range slider for period selection")
print("• Zoom in to specific date ranges")
print("• Hover for exact values")

## Lab 9: Geographic Data Visualization 🗺️

### 📚 Concept Explanation
Geographic visualizations help:
- **Spatial patterns**: Identify regional trends
- **Distributions**: Visualize geographic data spread
- **Comparisons**: Compare regions

In [None]:
# Generate geographic data (US state data)
us_states = ['California', 'Texas', 'Florida', 'New York', 'Pennsylvania',
            'Illinois', 'Ohio', 'Georgia', 'North Carolina', 'Michigan']

geo_data = pd.DataFrame({
    'state': us_states,
    'code': ['CA', 'TX', 'FL', 'NY', 'PA', 'IL', 'OH', 'GA', 'NC', 'MI'],
    'population': [39.5, 29.0, 21.5, 19.5, 12.8, 12.7, 11.7, 10.6, 10.5, 10.0],
    'gdp_per_capita': [75000, 62000, 45000, 85000, 60000, 65000, 55000, 50000, 52000, 48000],
    'unemployment': [4.2, 3.8, 3.5, 4.5, 4.0, 4.3, 3.9, 3.7, 3.8, 4.1]
})

print("🗺️ Geographic Data Sample:")
print(geo_data.head())

In [None]:
# Choropleth Map
fig_choropleth = px.choropleth(
    geo_data,
    locations='code',
    locationmode='USA-states',
    color='gdp_per_capita',
    hover_name='state',
    hover_data={'population': True, 'unemployment': True},
    color_continuous_scale='Viridis',
    title='US GDP per Capita by State',
    labels={'gdp_per_capita': 'GDP per Capita ($)',
           'population': 'Population (M)',
           'unemployment': 'Unemployment (%)'}
)

fig_choropleth.update_geos(
    scope='usa',
    projection_type='albers usa',
    showlakes=True,
    lakecolor='rgb(255, 255, 255)'
)

fig_choropleth.update_layout(height=500)
fig_choropleth.show()

print("\n💡 Geographic visualization applications:")
print("• Regional comparisons and pattern discovery")
print("• Spatial cluster identification")
print("• Population/economic data representation")

## Lab 10: Dashboard-Style Comprehensive Visualizations 📊

### 📚 Concept Explanation
Dashboards provide integrated insights by combining multiple charts.

In [None]:
# Generate business KPI data
np.random.seed(42)
months = pd.date_range('2023-01', periods=12, freq='M')

kpi_data = pd.DataFrame({
    'month': months,
    'revenue': np.random.uniform(80, 120, 12) * 1000000,
    'customers': np.random.uniform(8000, 12000, 12).astype(int),
    'conversion_rate': np.random.uniform(2, 5, 12),
    'churn_rate': np.random.uniform(5, 8, 12),
    'nps_score': np.random.uniform(30, 70, 12)
})

departments = ['Sales', 'Marketing', 'Engineering', 'Support', 'Operations']
dept_performance = pd.DataFrame({
    'department': departments,
    'headcount': [50, 30, 80, 25, 15],
    'efficiency': [85, 78, 92, 88, 81],
    'budget_used': [92, 88, 95, 78, 85]
})

print("📊 KPI Dashboard data prepared")

In [None]:
# Dashboard with Plotly Subplots
fig = make_subplots(
    rows=3, cols=3,
    subplot_titles=('Monthly Revenue Trend', 'Customer Count Change', 'Conversion vs Churn',
                   'NPS Score Trend', 'Headcount by Department', 'Department Efficiency',
                   'YTD Performance Summary', 'Budget Utilization', 'KPI Correlation'),
    row_heights=[0.35, 0.35, 0.3],
    column_widths=[0.35, 0.35, 0.3],
    specs=[[{'type': 'scatter'}, {'type': 'bar'}, {'type': 'scatter'}],
           [{'type': 'scatter'}, {'type': 'pie'}, {'type': 'bar'}],
           [{'type': 'indicator'}, {'type': 'bar'}, {'type': 'heatmap'}]],
    vertical_spacing=0.12,
    horizontal_spacing=0.10
)

# 1. Monthly revenue trend
fig.add_trace(
    go.Scatter(x=kpi_data['month'], y=kpi_data['revenue']/1000000,
              mode='lines+markers', name='Revenue',
              line=dict(color='#1f77b4', width=3),
              marker=dict(size=8)),
    row=1, col=1
)

# 2. Customer count change
fig.add_trace(
    go.Bar(x=kpi_data['month'], y=kpi_data['customers'],
          name='Customers',
          marker_color='#2ca02c'),
    row=1, col=2
)

# 3. Conversion vs Churn
fig.add_trace(
    go.Scatter(x=kpi_data['month'], y=kpi_data['conversion_rate'],
              mode='lines+markers', name='Conversion Rate',
              line=dict(color='#4ECDC4', width=2)),
    row=1, col=3
)
fig.add_trace(
    go.Scatter(x=kpi_data['month'], y=kpi_data['churn_rate'],
              mode='lines+markers', name='Churn Rate',
              line=dict(color='#FF6B6B', width=2, dash='dash')),
    row=1, col=3
)

# 4. NPS Score
fig.add_trace(
    go.Scatter(x=kpi_data['month'], y=kpi_data['nps_score'],
              mode='lines+markers', name='NPS',
              fill='tozeroy',
              line=dict(color='#9467bd', width=2)),
    row=2, col=1
)

# 5. Headcount by department (pie chart)
fig.add_trace(
    go.Pie(labels=dept_performance['department'],
          values=dept_performance['headcount'],
          hole=0.4),
    row=2, col=2
)

# 6. Department efficiency
fig.add_trace(
    go.Bar(x=dept_performance['department'],
          y=dept_performance['efficiency'],
          marker_color=['#FF6B6B', '#4ECDC4', '#45B7D1', '#96CEB4', '#FFEAA7']),
    row=2, col=3
)

# 7. YTD performance indicator (Gauge)
fig.add_trace(
    go.Indicator(
        mode = "gauge+number+delta",
        value = kpi_data['revenue'].sum()/1000000,
        title = {'text': "YTD Revenue (M$)"},
        delta = {'reference': 1100},
        gauge = {'axis': {'range': [None, 1500]},
                'bar': {'color': "#2ca02c"},
                'steps': [
                    {'range': [0, 500], 'color': "lightgray"},
                    {'range': [500, 1000], 'color': "gray"}],
                'threshold': {'line': {'color': "red", 'width': 4},
                            'thickness': 0.75, 'value': 1200}}),
    row=3, col=1
)

# 8. Budget utilization
fig.add_trace(
    go.Bar(y=dept_performance['department'],
          x=dept_performance['budget_used'],
          orientation='h',
          marker_color='#17becf'),
    row=3, col=2
)

# 9. KPI correlation
kpi_corr = kpi_data[['revenue', 'customers', 'conversion_rate', 'churn_rate', 'nps_score']].corr()
fig.add_trace(
    go.Heatmap(z=kpi_corr.values,
              x=['Revenue', 'Customers', 'Conv Rate', 'Churn', 'NPS'],
              y=['Revenue', 'Customers', 'Conv Rate', 'Churn', 'NPS'],
              colorscale='RdBu',
              zmid=0,
              text=kpi_corr.values.round(2),
              texttemplate='%{text}',
              textfont={"size": 8}),
    row=3, col=3
)

# Update layout
fig.update_layout(
    title_text="📊 Business KPI Dashboard - 2023",
    title_font_size=20,
    showlegend=False,
    height=900,
    plot_bgcolor='rgba(240,240,240,0.5)',
    paper_bgcolor='white'
)

# Update axis labels
fig.update_xaxes(title_text="Month", row=1, col=1, tickformat='%b')
fig.update_yaxes(title_text="Revenue (M$)", row=1, col=1)

fig.show()

print("\n💡 Dashboard design essentials:")
print("• Key KPIs at a glance")
print("• Consistent color theme and layout")
print("• Appropriate combination of various chart types")
print("• Interactive elements for detailed exploration")

---
# 📝 Practice Summary and Key Takeaways

## What We Learned Today

### Grammar of Graphics Essentials
- Systematic approach to visualization by breaking it down into components
- Building visualizations in a layered manner
- Pattern discovery through scale and coordinate transformations
- Multidimensional data exploration through faceting

### Practical EDA Techniques
- **Univariate Analysis**: Understanding distributions, outliers, and central tendencies
- **Bivariate Analysis**: Discovering correlations and dependencies
- **Categorical Data**: Group comparisons and compositions
- **Time Series Analysis**: Trends, seasonality, and volatility

### Advanced Visualizations
- **Interactive Visualizations**: Dynamic exploration with Plotly
- **Geographic Data**: Spatial pattern visualization
- **Dashboards**: Providing integrated insights

## 💪 Practice Assignments

1. **Perform EDA with Your Own Data**
   - Analyze in order: univariate → bivariate → multivariate
   - Use at least 5 different visualization techniques

2. **Create an Interactive Dashboard**
   - Use real work data
   - Add interactive elements with Plotly

3. **Improve Visualizations**
   - Reconstruct existing report charts from a Grammar of Graphics perspective
   - Optimize color, layout, and information density

## 📚 Additional Learning Resources

- **Book**: "The Grammar of Graphics" by Leland Wilkinson
- **Online**: Plotly Official Documentation (https://plotly.com/python/)
- **Course**: Coursera "Applied Data Science with Python"
- **Practice**: Learn from EDA examples on Kaggle Notebooks

---

### 🎯 Remember:
> "The purpose of visualization is insight, not pictures."
> - Ben Shneiderman

Good visualization tells the story hidden in data.
Keep practicing and experimenting to develop your own style!

---
**Questions and feedback are always welcome!** 🙋‍♂️🙋‍♀️