# Data Visualization

## Learning Objectives
After completing this lesson, you will be able to:
- Create and customize visualizations using matplotlib
- Build subplots and multi-plot figures
- Use Pandas plotting methods for quick visualization
- Create statistical plots with seaborn
- Understand distribution and relationship visualization techniques

---

## Why Data Visualization Matters
Making informative visualizations is one of the most important tasks in data analysis. You may need to:
- **Explore data**: Identify patterns, outliers, and distributions through EDA (Exploratory Data Analysis)
- **Communicate findings**: Present results to both technical and non-technical audiences
- **Tell stories**: Use interactive visualizations to engage stakeholders

This lesson covers three powerful visualization tools: **matplotlib** (foundation), **pandas plotting** (convenience), and **seaborn** (statistical graphics).

## Section 1: Setup and Matplotlib Basics

### Importing Required Libraries
First, we will import the libraries needed for visualization. Matplotlib is the foundation; seaborn and pandas plotting build on top of it.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set random seed for reproducibility
np.random.seed(111)

### Understanding Matplotlib

`matplotlib` is a multi-platform data visualization library built on NumPy arrays and designed to work with the broader Pandas ecosystem. It provides an **object-oriented API** for plotting graphs and charts.

**Key concept**: Over time, matplotlib has spawned higher-level tools like `seaborn` that simplify common visualization tasks while using matplotlib's power underneath.

### Creating Your First Plot: Line Plot

The simplest plot type in matplotlib is the **line plot**, which is the default when you call `plt.plot()`.

#### Example 1: Simple NumPy Array

In [None]:
# Create simple data and plot it
data = np.arange(10)
print('Data:', data)

plt.plot(data)
plt.show()

**What happened?** Matplotlib automatically creates a plot with:
- x-axis: indices (0 to 9)
- y-axis: values (0 to 9)
- Default line style and color

#### Example 2: Pandas Series with Custom Index

When plotting a Pandas Series, matplotlib uses:
- x-axis: the Series index
- y-axis: the Series values

In [None]:
# Create a Series with custom index
data = pd.Series(np.arange(5), index=np.arange(2, 12, 2))
print('Series:')
print(data)

plt.plot(data)
plt.title('Pandas Series Plot')
plt.show()

## Section 2: Working with Figures and Subplots

### Understanding the Matplotlib Hierarchy

Matplotlib's structure follows a hierarchy:
1. **Figure**: The overall window/canvas
2. **Axes**: Individual plots within the figure (subplots)
3. **Plot methods**: Actual drawing methods (`.plot()`, `.scatter()`, etc.)

### Creating a Basic Figure and Axes

To build more complex visualizations, we need to create a Figure object and add Axes (subplots) to it.

In [None]:
# Step 1: Create a new figure
fig = plt.figure()

# Step 2: Add an axes (subplot) to the figure
ax = fig.add_subplot()

# Step 3: Plot data on the axes
ax.plot(data)  # Using the Series from earlier
plt.show()

### Creating Multiple Subplots

Use `add_subplot(rows, cols, index)` to create a grid of plots. For example:
- `add_subplot(2, 2, 1)` creates a 2x2 grid and places the plot at position 1 (top-left)

In [None]:
# Create a figure with a 2x2 subplot grid
fig = plt.figure(figsize=(10, 8))

# Create subplots at different positions
ax1 = fig.add_subplot(2, 2, 1)
ax2 = fig.add_subplot(2, 2, 2)
ax3 = fig.add_subplot(2, 2, 3)

print('Figure created with 3 empty subplots')
fig

### Adding Different Plot Types to Subplots

**Code explanation**:
- `ax3.plot()`: Creates a line plot with custom styling (black color, dashed line)
- `ax1.hist()`: Creates a histogram with 20 bins and 30% transparency
- `ax2.scatter()`: Creates a scatter plot with random jitter

In [None]:
# Add different plot types to each subplot
ax1.hist(np.random.standard_normal(100), bins=20, color='black', alpha=0.3)
ax1.set_title('Histogram')

ax2.scatter(np.arange(30), np.arange(30) + 3 * np.random.standard_normal(30))
ax2.set_title('Scatter Plot')

ax3.plot(np.random.standard_normal(50).cumsum(), color='black', linestyle='dashed')
ax3.set_title('Cumulative Line Plot')

fig.tight_layout()  # Adjust spacing automatically
fig

### Using plt.subplots() for Faster Setup

Instead of creating a figure and adding subplots individually, `plt.subplots()` creates both in one call and returns them as a tuple.

In [None]:
# Create a 2x2 subplot grid in one line
fig, axes = plt.subplots(2, 2, figsize=(10, 10))

# axes is a 2D NumPy array
print('Type of axes:', type(axes))
print('Shape of axes:', axes.shape)
print('\nAccess subplot at row 0, col 1:')
print(axes[0, 1])

### Populating the Subplot Grid

**Code explanation**:
- Loop through rows and columns using `range(2)`
- Access each axes using `axes[i, j]`
- `sharex=True, sharey=True`: All plots share the same scale

In [None]:
# Create subplots with shared axes (all plots use same scale)
fig, axes = plt.subplots(2, 2, sharex=True, sharey=True, figsize=(10, 10))

# Fill each subplot with a histogram
for i in range(2):
    for j in range(2):
        axes[i, j].hist(np.random.standard_normal(500), bins=50,
                        color='black', alpha=0.5)
        axes[i, j].set_title(f'Histogram {i},{j}')

# Control spacing between subplots
fig.subplots_adjust(wspace=0, hspace=0)
fig.suptitle('2x2 Histogram Grid with Shared Axes', fontsize=14, y=1.00)
fig

## Section 3: Styling and Customization

### Understanding Plot Styling

The `plot()` function accepts optional styling arguments to customize line appearance, colors, and markers.

### Styling Parameters

Three main styling parameters:

| Parameter | Purpose | Examples |
|-----------|---------|----------|
| `color` | Line color | `'green'`, `'g'`, `'#008000'` |
| `linestyle` | Line style | `'dashed'`, `'--'`, `'-.'`, `':'` |
| `marker` | Data point marker | `'o'` (circle), `'s'` (square), `'^'` (triangle) |

For a full list, refer to [matplotlib documentation](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html).

In [None]:
# Example 1: Using keyword arguments for styling
fig, ax = plt.subplots(figsize=(10, 5))

data = np.random.standard_normal(30).cumsum()
ax.plot(data, color='green', linestyle='dashed', marker='o', linewidth=2, markersize=6)
ax.set_title('Styled Line Plot')
ax.set_xlabel('X-axis')
ax.set_ylabel('Y-axis')
fig

### Using Format Strings for Quick Styling

Format string syntax: `'[marker][line][color]'` where each component is optional.

**Example**: `'o--g'` means: circle marker + dashed line + green color

In [None]:
# Example 2: Using format strings (shorthand)
fig, ax = plt.subplots(figsize=(10, 5))

ax.plot(data, 'o--g', linewidth=2, markersize=6, label='Format: o--g')
ax.set_title('Using Format String for Styling')
ax.legend()
fig

### Plotting Multiple Lines and Creating Legends

Plot multiple series on the same axes and use legends to identify them.

In [None]:
# Plot multiple lines with different styles
fig, ax = plt.subplots(figsize=(10, 5))

data = np.random.standard_normal(30).cumsum()

# Plot 1: Default line
ax.plot(data, color='black', label='Default', linewidth=2)

# Plot 2: Same data, but with step style
ax.plot(data, color='black', linestyle='dashed', drawstyle='steps-post',
        label='Steps-post', linewidth=2)

# Add legend
ax.legend(loc='best')
ax.set_title('Multiple Lines with Legend')
ax.set_xlabel('Time')
ax.set_ylabel('Value')
fig

## Section 4: Axes, Labels, and Limits

### Controlling Plot Range and Ticks

You can control:
- **Plot range**: `xlim()`/`ylim()` - set axis bounds
- **Tick positions**: `xticks()`/`yticks()` - where ticks appear
- **Tick labels**: `xticklabels()`/`yticklabels()` - tick label text
- **Axis labels**: `xlabel()`/`ylabel()` and `title()`

In [None]:
# Create a plot with random walk data
fig, ax = plt.subplots(figsize=(10, 5))

rng = np.random.default_rng(seed=111)
cumsum_data = rng.standard_normal(1000).cumsum()

ax.plot(cumsum_data)
fig

### Customizing X-Axis Ticks and Labels

**Code explanation**:
- `set_xticks()`: Position where ticks appear
- `set_xticklabels()`: Labels for each tick
- `rotation`: Rotate labels for readability

In [None]:
# Set specific tick positions and labels
ax.set_xticks([0, 250, 500, 750, 1000])
ax.set_xticklabels(['Start', 'Q1', 'Mid', 'Q3', 'End'], rotation=45, fontsize=9)

# Add axis labels and title
ax.set_xlabel('Time Period', fontsize=11, fontweight='bold')
ax.set_ylabel('Cumulative Value', fontsize=11, fontweight='bold')
ax.set_title('Random Walk with Custom Axes', fontsize=13, fontweight='bold')

fig.tight_layout()
fig

## Section 5: Pandas Plotting Methods

### Quick Visualization with Pandas

Pandas objects (Series and DataFrame) have built-in `.plot()` methods that use matplotlib under the hood. This provides a convenient high-level interface for quick visualization.

In [None]:
# Load or create example data
# For demo purposes, create sample S&P 500-like data
import pandas as pd
dates = pd.date_range('2007-01-01', periods=1000)
spx = pd.Series(np.random.standard_normal(1000).cumsum() + 1000, index=dates)

print('Data loaded successfully')
print(spx.head())

### Pandas Series Plot (Line Plot)

By default, `.plot()` creates a line plot with the Series index on x-axis and values on y-axis.

In [None]:
# Simple line plot from Pandas Series
spx.plot(figsize=(10, 5))
plt.title('S&P 500 Index')
plt.ylabel('Price')
plt.show()

### Different Plot Types with Pandas

Use `kind=` parameter or `.plot.<type>()` accessor to change plot type.

Common types: `'line'`, `'bar'`, `'barh'`, `'hist'`, `'box'`, `'kde'`, `'area'`, `'scatter'`

In [None]:
# Histogram plot
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

# Method 1: Using kind parameter
spx.plot(kind='hist', bins=50, ax=axes[0], title="Histogram (kind='hist')")

# Method 2: Using accessor method
spx.plot.hist(bins=50, ax=axes[1], title='Histogram (plot.hist())')

fig.tight_layout()
fig

### Plotting DataFrames with Multiple Columns

When plotting a DataFrame, each column becomes a separate line with automatic legend.

In [None]:
# Create a DataFrame with multiple columns
df = pd.DataFrame({
    'Series_1': np.random.standard_normal(100).cumsum(),
    'Series_2': np.random.standard_normal(100).cumsum(),
    'Series_3': np.random.standard_normal(100).cumsum()
})

# Plot all columns
df.plot(figsize=(10, 5))
plt.title('Multiple Time Series')
plt.ylabel('Value')
plt.show()

## Section 6: Statistical Visualization with Seaborn

### Introduction to Seaborn

`seaborn` is a library that builds on matplotlib to provide a higher-level interface for **statistical graphics**. It:
- Integrates seamlessly with pandas DataFrames
- Automatically handles data aggregation and statistics
- Provides better default styling
- Simplifies complex visualizations

### Bar Plots: Comparing Categories

Bar plots compare values across categorical groups. The height represents the value being compared.

In [None]:
# Create sample tips dataset
tips = pd.DataFrame({
    'total_bill': [18.01, 10.29, 21.01, 23.68, 24.59, 25.29, 8.77, 15.01, 11.02, 14.07,
                   12.5, 16.0, 14.5, 13.2, 15.8, 17.3, 18.5, 19.0, 20.0, 21.5],
    'tip': [1.01, 1.66, 3.5, 3.31, 3.61, 4.71, 2, 3.12, 1.96, 3.23,
            2.5, 3.0, 2.8, 2.4, 3.2, 3.5, 3.8, 3.9, 4.2, 4.5],
    'time': ['Dinner', 'Lunch', 'Lunch', 'Lunch', 'Lunch', 'Lunch', 'Lunch', 'Dinner', 'Lunch', 'Lunch',
             'Dinner', 'Lunch', 'Lunch', 'Dinner', 'Dinner', 'Lunch', 'Lunch', 'Dinner', 'Dinner', 'Lunch'],
    'day': ['Thurs', 'Sun', 'Sat', 'Sun', 'Sun', 'Sat', 'Fri', 'Fri', 'Fri', 'Fri',
            'Thurs', 'Sun', 'Sat', 'Sun', 'Sun', 'Sat', 'Fri', 'Fri', 'Fri', 'Fri']
})

# Calculate tip percentage
tips['tip_pct'] = tips['tip'] / (tips['total_bill'] - tips['tip'])

print('Tips dataset:')
print(tips.head())

In [None]:
# Seaborn bar plot: Average tip percentage by day
# Note: Seaborn automatically calculates the mean and shows 95% confidence intervals
sns.barplot(x='day', y='tip_pct', data=tips, orient='v')
plt.title('Average Tip Percentage by Day')
plt.show()

### Adding a Second Categorical Variable with `hue`

Use the `hue` parameter to split bars by an additional categorical variable, creating grouped bars.

In [None]:
# Bar plot with hue (color grouping)
sns.barplot(x='day', y='tip_pct', hue='time', data=tips)
plt.title('Tip Percentage by Day and Time')
plt.show()

### Visualizing Distributions: Histograms and Density Plots

**Histograms** show the frequency distribution of a continuous variable using bins.
**KDE (Kernel Density Estimate)** plots smooth probability distributions.

The **`bins` parameter** controls granularity - more bins show finer details, fewer bins show broader patterns.

In [None]:
# Create figure with different bin sizes
fig, axes = plt.subplots(1, 3, figsize=(14, 4))

# Filter extreme values first (recommended practice)
q1 = tips['tip_pct'].quantile(0.01)
q99 = tips['tip_pct'].quantile(0.99)
filtered_tips = tips[(tips['tip_pct'] > q1) & (tips['tip_pct'] < q99)]

# Plot with different bin counts
for idx, bins in enumerate([10, 20, 50]):
    filtered_tips['tip_pct'].plot.hist(ax=axes[idx], bins=bins, alpha=0.7)
    axes[idx].set_title(f'Bins = {bins}')
    axes[idx].set_xlabel('Tip Percentage')

fig.tight_layout()
fig

In [None]:
# Seaborn histogram with KDE overlay
# Code explanation:
# - kde=True: Adds a smooth probability density curve
# - bins=50: Controls histogram granularity
# - stat='density': Normalizes area under histogram to 1

fig, axes = plt.subplots(1, 2, figsize=(12, 4))

# Histogram alone
sns.histplot(filtered_tips['tip_pct'], bins=50, ax=axes[0])
axes[0].set_title('Histogram Only')

# Histogram with KDE
sns.histplot(filtered_tips['tip_pct'], bins=50, kde=True, ax=axes[1], stat='density')
axes[1].set_title('Histogram with KDE Overlay')

fig.tight_layout()
fig

### Relationships: Scatter and Regression Plots

Scatter plots show relationships between two continuous variables. Adding a regression line helps identify trends.

In [None]:
# Create sample relationship data
np.random.seed(42)
x_data = np.random.randn(100) * 2
y_data = 2 * x_data + np.random.randn(100) * 1.5 + 3

# Seaborn regplot: scatter plot with regression line
# Code explanation:
# - regplot() automatically fits and plots a linear regression
# - scatter_kws controls scatter point appearance
# - line_kws controls regression line appearance

sns.regplot(x=x_data, y=y_data, scatter_kws={'alpha': 0.5})
plt.title('Scatter Plot with Regression Line')
plt.xlabel('X Variable')
plt.ylabel('Y Variable')
plt.show()

### Categorical Relationships: Faceted Plots

`catplot()` creates faceted grids of plots, split by categorical variables. This helps visualize complex multi-dimensional data.

In [None]:
# Create a faceted plot
# Code explanation:
# - col='day': Create separate columns for each day value
# - hue='time': Color code by time (Lunch/Dinner)
# - kind='bar': Use bar plots

filtered = tips[tips['tip_pct'] < 1]  # Filter outliers

sns.catplot(x='day', y='tip_pct', hue='time', col='day', kind='bar',
            data=filtered, height=4, aspect=1.2)
plt.suptitle('Tip Percentage by Day and Time (Faceted)', y=1.02)
plt.show()

### Box Plots: Distribution Summary

Box plots show five-number summary: **min, Q1, median, Q3, max**

This provides a compact view of data distribution and outliers. The box shows the middle 50% of data (Q1 to Q3), the line inside is the median.

In [None]:
# Box plot by category
# Code explanation:
# - kind='box': Creates box plot (box-and-whisker)
# - Box shows: median (line), quartiles (box)
# - Whiskers show 1.5*IQR, dots show outliers

sns.catplot(x='day', y='total_bill', kind='box', data=tips, height=5, aspect=1.5)
plt.title('Distribution of Total Bill by Day')
plt.show()

### Pairwise Relationships: Pair Plot

`pairplot()` creates a matrix of plots showing relationships between all pairs of variables. Diagonal shows individual distributions, off-diagonal shows pairwise relationships.

In [None]:
# Create sample numeric data for pairplot
sample_data = tips[['total_bill', 'tip', 'tip_pct']].sample(n=min(30, len(tips)))

# Pairplot: scatter plots for all pairs, distribution on diagonal
# Code explanation:
# - diag_kind='kde': Use KDE for diagonal (distribution)
# - plot_kws={'alpha': 0.3}: Make scatter points semi-transparent

sns.pairplot(sample_data, diag_kind='kde', plot_kws={'alpha': 0.5}, height=2)
plt.suptitle('Pairwise Relationships', y=1.00)
plt.show()

## Saving Plots to File

Use `savefig()` to export your plots in various formats. This is useful for including plots in reports, presentations, or publications.

In [None]:
# Create a simple plot for demonstration
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(spx[:200], linewidth=2, label='S&P 500')
ax.set_title('Sample Plot for Saving')
ax.legend()

# Uncomment to save in different formats
# fig.savefig('plot.svg')  # Vector format (scalable, smaller file)
# fig.savefig('plot.png', dpi=300)  # Raster format (high resolution)
# fig.savefig('plot.pdf')  # PDF format (for reports)

print('Plot ready to save with: fig.savefig(filename)')
print('DPI options: 72 (screen), 150 (print), 300 (publication)')

## Global Configuration

Customize default matplotlib behavior for all subsequent plots using `plt.rc()`. This is useful when creating multiple plots that should have a consistent style.

In [None]:
# Set global defaults (uncomment to use)
# plt.rc('figure', figsize=(12, 6))  # Default figure size
# plt.rc('font', size=11)  # Default font size
# plt.rc('axes', labelsize=12)  # Axes label size
# plt.rc('lines', linewidth=2)  # Default line width

# View current settings
print('Current figure size:', plt.rcParams['figure.figsize'])
print('Current font size:', plt.rcParams['font.size'])
print('Current line width:', plt.rcParams['lines.linewidth'])

# Restore defaults if needed
# plt.rcdefaults()

---

# Optional Content: Advanced Topics

The sections below cover advanced visualization techniques and supplementary material. These are optional and can be explored based on your interests.

## Advanced: Annotations and Arrows

Add text, arrows, and shapes to highlight important features in your plots. This is useful for emphasizing key findings or drawing attention to specific data points.

In [None]:
# Create a plot for annotation
fig, ax = plt.subplots(figsize=(12, 6))

data_to_plot = np.random.standard_normal(100).cumsum()
ax.plot(data_to_plot, linewidth=2)

# Code explanation for annotate():
# - xy: coordinate to annotate (data coordinates)
# - xytext: coordinate for text (display coordinates)
# - arrowprops: dict of arrow properties connecting annotation to data point

# Add annotation with arrow at peak
ax.annotate('Peak Point',
           xy=(data_to_plot.argmax(), data_to_plot.max()),
           xytext=(data_to_plot.argmax() - 20, data_to_plot.max() - 2),
           arrowprops=dict(arrowstyle='->', color='red', lw=2),
           fontsize=11, color='red', fontweight='bold')

# Add annotation at minimum
ax.annotate('Low Point',
           xy=(data_to_plot.argmin(), data_to_plot.min()),
           xytext=(data_to_plot.argmin() + 15, data_to_plot.min() + 2),
           arrowprops=dict(arrowstyle='->', color='blue', lw=2),
           fontsize=11, color='blue', fontweight='bold')

ax.set_title('Plot with Annotations')
fig

## Advanced: Seaborn Style Themes

Seaborn provides built-in style themes for consistent, professional-looking plots. Different themes are useful for different contexts (presentations, publications, web, etc.).

In [None]:
# Available styles
print('Available seaborn styles:')
print(sns.available_styles())

# Code explanation:
# - set_style() changes the overall look and feel
# - darkgrid: default with grid lines (good for reports)
# - whitegrid: clean with grid (good for presentations)
# - dark/white: minimal styling
# - ticks: adds tick marks

# Set style
sns.set_style('whitegrid')

# Create a plot with the new style
fig, ax = plt.subplots(figsize=(10, 5))
sns.barplot(x='day', y='tip_pct', data=tips, ax=ax)
ax.set_title('Bar Plot with Whitegrid Style')

fig

## Advanced: Color Palettes

Seaborn allows you to customize color palettes for better visual appeal and accessibility. Different palettes are suited for different purposes.

In [None]:
# Set a different color palette
# Options: 'deep', 'muted', 'pastel', 'bright', 'dark', 'colorblind'
sns.set_palette('muted')

# Create plot with new palette
fig, ax = plt.subplots(figsize=(10, 5))
sns.barplot(x='day', y='tip_pct', hue='time', data=tips, ax=ax)
ax.set_title('Bar Plot with Muted Palette')
fig

## Advanced: Joint Plots for Bivariate Analysis

`jointplot()` shows the relationship between two variables with marginal distributions on the axes. This is useful for understanding how two variables are related.

In [None]:
# Create a joint plot
# Code explanation:
# - kind='scatter': scatter plot in center
# - marginal plots on axes show distributions of each variable

g = sns.jointplot(data=tips, x='total_bill', y='tip', kind='scatter', height=6)
g.fig.suptitle('Relationship between Total Bill and Tip', y=1.00)
plt.show()

In [None]:
# Joint plot with hex density
# This is useful when you have many overlapping points
g = sns.jointplot(data=tips, x='total_bill', y='tip', kind='hex', height=6)
g.fig.suptitle('2D Density of Total Bill vs Tip', y=1.00)
plt.show()

## Advanced: Heatmaps for Correlation

Heatmaps are useful for visualizing correlation matrices or other 2D data. They use color intensity to represent values.

In [None]:
# Create correlation matrix
corr_data = tips[['total_bill', 'tip', 'tip_pct']].corr()
print('Correlation Matrix:')
print(corr_data)

# Code explanation:
# - sns.heatmap() visualizes matrix data with colors
# - annot=True adds correlation values to cells
# - cmap='coolwarm' uses blue-red color scheme

fig, ax = plt.subplots(figsize=(8, 6))
sns.heatmap(corr_data, annot=True, cmap='coolwarm', center=0,
            square=True, linewidths=1, cbar_kws={'label': 'Correlation'},
            ax=ax)
ax.set_title('Correlation Heatmap')
fig

## Practice Exercises

Try these exercises to reinforce your learning. Solutions are provided below each exercise for reference.

### Exercise 1: Multiple Subplots with Custom Styling

**Task**: Create a 3x2 subplot grid with line plots. For each subplot:
- Plot a cumulative random walk (`np.random.standard_normal(100).cumsum()`)
- Add a title with the plot number
- Set x-label and y-label
- Adjust spacing with `tight_layout()`

In [None]:
# Exercise 1 Solution
fig, axes = plt.subplots(3, 2, figsize=(12, 10))

for i in range(3):
    for j in range(2):
        data = np.random.standard_normal(100).cumsum()
        axes[i, j].plot(data, linewidth=2)
        axes[i, j].set_title(f'Random Walk {i*2+j+1}', fontweight='bold')
        axes[i, j].set_xlabel('Time')
        axes[i, j].set_ylabel('Value')
        axes[i, j].grid(True, alpha=0.3)

fig.tight_layout()
fig

### Exercise 2: Advanced Styling with Format Strings

**Task**: Create a line plot with:
- Magenta color (`'m'`)
- Star marker (`'*'`)
- Dash-dot line style (`'-.'`)
- Custom x-axis labels ('Start', 'Q1', 'Mid', 'Q3', 'End')
- Proper axis labels and title

In [None]:
# Exercise 2 Solution
fig, ax = plt.subplots(figsize=(10, 5))

data = np.random.standard_normal(20).cumsum()
# Format string: marker 'o' + line style '--' + color 'g' = 'o--g'
# Or use: 'm*-.' for magenta, star marker, dash-dot line
ax.plot(data, 'm*-.', linewidth=2.5, markersize=10, label='Custom Styled Line')

# Custom tick labels
ax.set_xticks(range(0, 20, 5))
ax.set_xticklabels(['Start', 'Q1', 'Mid', 'Q3', 'End'])

ax.set_xlabel('Time Period', fontsize=11, fontweight='bold')
ax.set_ylabel('Cumulative Value', fontsize=11, fontweight='bold')
ax.set_title('Advanced Styling Exercise', fontsize=13, fontweight='bold')
ax.legend(loc='best')
ax.grid(True, alpha=0.3)

fig.tight_layout()
fig

### Exercise 3: Seaborn Categorical Visualization

**Task**: Create a seaborn barplot showing:
- x-axis: 'day'
- y-axis: 'total_bill'
- hue: 'time' (Lunch/Dinner distinction)
- Add an appropriate title

In [None]:
# Exercise 3 Solution
sns.set_style('whitegrid')
fig, ax = plt.subplots(figsize=(10, 6))

sns.barplot(x='day', y='total_bill', hue='time', data=tips, ax=ax, palette='Set2')
ax.set_title('Average Total Bill by Day and Time', fontsize=13, fontweight='bold')
ax.set_xlabel('Day of Week', fontsize=11)
ax.set_ylabel('Total Bill ($)', fontsize=11)

fig.tight_layout()
fig

### Exercise 4: Distribution Analysis

**Task**: Create a histogram with KDE overlay for the 'total_bill' column:
- Use seaborn's `histplot()`
- Set bins=40
- Enable KDE with `kde=True`
- Add labels and title

In [None]:
# Exercise 4 Solution
fig, ax = plt.subplots(figsize=(10, 5))

sns.histplot(tips['total_bill'], bins=40, kde=True, stat='density', ax=ax, color='steelblue')
ax.set_title('Distribution of Total Bill', fontsize=13, fontweight='bold')
ax.set_xlabel('Total Bill ($)', fontsize=11)
ax.set_ylabel('Density', fontsize=11)

fig.tight_layout()
fig

### Challenge Exercise: Multi-panel Comparison

**Task**: Create a 2x2 subplot grid comparing distributions:
1. Top-left: Histogram of total_bill
2. Top-right: Histogram of tip
3. Bottom-left: Box plot of total_bill by day
4. Bottom-right: Scatter plot of total_bill vs tip

Make sure to label all axes and add titles to each subplot.

In [None]:
# Challenge Solution
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# Top-left: Histogram of total_bill
axes[0, 0].hist(tips['total_bill'], bins=30, color='skyblue', edgecolor='black', alpha=0.7)
axes[0, 0].set_title('Distribution of Total Bill', fontweight='bold')
axes[0, 0].set_xlabel('Total Bill ($)')
axes[0, 0].set_ylabel('Frequency')

# Top-right: Histogram of tip
axes[0, 1].hist(tips['tip'], bins=30, color='lightcoral', edgecolor='black', alpha=0.7)
axes[0, 1].set_title('Distribution of Tip', fontweight='bold')
axes[0, 1].set_xlabel('Tip ($)')
axes[0, 1].set_ylabel('Frequency')

# Bottom-left: Box plot
tips.boxplot(column='total_bill', by='day', ax=axes[1, 0])
axes[1, 0].set_title('Total Bill by Day', fontweight='bold')
axes[1, 0].set_xlabel('Day')
axes[1, 0].set_ylabel('Total Bill ($)')
plt.sca(axes[1, 0])
plt.xticks(rotation=0)

# Bottom-right: Scatter plot
axes[1, 1].scatter(tips['total_bill'], tips['tip'], alpha=0.6, s=100, color='green')
axes[1, 1].set_title('Total Bill vs Tip', fontweight='bold')
axes[1, 1].set_xlabel('Total Bill ($)')
axes[1, 1].set_ylabel('Tip ($)')

fig.tight_layout()
fig