# Seaborn Basics: Introduction to Statistical Data Visualization

## Overview
Seaborn is a Python data visualization library based on **Matplotlib**. It provides a high-level interface for drawing attractive and informative statistical graphics.

### Key Concepts Covered:
1. **Seaborn vs Matplotlib**: The relationship between the two libraries.
2. **Dataset Loading**: Using built-in datasets and Pandas integration.
3. **The Relational API**: Using `relplot()` for figure-level plotting.
4. **Basic Plots**: Syntax for `scatterplot()` and `lineplot()`.
5. **Aesthetics**: Mapping `hue`, `style`, and `size` to variables.


In [None]:
# Setup: Import necessary libraries
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Set the theme for nicer looking plots
sns.set_theme(style="darkgrid")

## 1. Seaborn vs Matplotlib: The Relationship

### Understanding the Hierarchy
- **Matplotlib**: The foundational plotting library. It is powerful and customizable but can be verbose.
- **Seaborn**: Built *on top* of Matplotlib. It acts as a high-level API, making complex plots (like heatmaps or regression plots) easier to create with fewer lines of code.

| Feature | Matplotlib | Seaborn |
|---------|------------|---------|
| **Level** | Low-level (Control over every pixel) | High-level (Statistical patterns) |
| **Syntax** | `plt.plot(x, y)` | `sns.lineplot(data=df, x='col1', y='col2')` |
| **DataFrames**| Requires manual array extraction | Native Pandas DataFrame support |
| **Defaults** | Basic aesthetics | Modern, visually appealing defaults |

> **Note**: Since Seaborn uses Matplotlib under the hood, you can use `plt` functions (like `plt.title()`, `plt.show()`) to customize Seaborn plots.

## 2. Dataset Loading

Seaborn comes with built-in datasets (cleaned and ready for practice). These are loaded as **Pandas DataFrames**.

Common datasets: `'tips'`, `'penguins'`, `'iris'`, `'flights'`.

In [None]:
# Load the 'tips' dataset
tips = sns.load_dataset('tips')

# Check the first few rows
print("Shape of dataset:", tips.shape)
print(tips.head())

## 3. The Relational API: `relplot()`

Seaborn splits its functions into **Figure-level** and **Axes-level** functions.

- **Figure-level (`relplot`)**: Manages the entire figure, including subplots (faceting). It returns a `FacetGrid`.
- **Axes-level (`scatterplot`, `lineplot`)**: Draws onto a single matplotlib axes. It returns an `Axes` object.

Use `relplot()` when you want to create relational plots (scatter or line) and potentially split them across multiple subplots.

In [None]:
# Using relplot for a scatter plot
sns.relplot(
    data=tips, 
    x="total_bill", 
    y="tip", 
    kind="scatter"  # Default is 'scatter', can also be 'line'
)
plt.title("Total Bill vs Tip (Figure Level)")
plt.show()

## 4. Basic Scatter & Line Plots

For simple, standalone plots, use the axes-level functions directly.

### Syntax
```python
sns.scatterplot(data=df, x='col_name', y='col_name')
sns.lineplot(data=df, x='col_name', y='col_name')
```

In [None]:
# Create sample data for lineplot
df_line = pd.DataFrame({
    'time': range(10),
    'value': [1, 3, 2, 5, 4, 8, 7, 9, 11, 10]
})

# Create two plots side-by-side using Matplotlib subplots
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# 1. Scatter Plot
sns.scatterplot(data=tips, x="total_bill", y="tip", ax=axes[0])
axes[0].set_title("Scatter: Bill vs Tip")

# 2. Line Plot
sns.lineplot(data=df_line, x="time", y="value", ax=axes[1])
axes[1].set_title("Line: Time Series Example")

plt.show()

## 5. Hue, Style, and Size (Semantic Mappings)

Seaborn allows you to map data variables to visual properties (semantics).

- **`hue`**: Maps a variable to **color** (useful for categories).
- **`style`**: Maps a variable to **marker style** (dots, crosses, etc.) or **line style** (solid, dashed).
- **`size`**: Maps a variable to **marker size**.

### Example: Enhancing the Scatter Plot

In [None]:
# Complex scatter plot with multiple semantics
plt.figure(figsize=(8, 6))

sns.scatterplot(
    data=tips,
    x="total_bill",
    y="tip",
    hue="smoker",      # Color by smoker status (Categorical)
    style="time",      # Shape by time of day (Categorical)
    size="size",       # Size by party size (Numerical)
    sizes=(20, 200)    # Range of sizes to use
)

plt.title("Multidimensional View of Tipping Behavior")
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left') # Move legend outside
plt.show()

--- 
## 📝 Practice Set

**Dataset**: Load the penguins dataset: `penguins = sns.load_dataset('penguins')`

1. **Check Data**: Print the first 5 rows and check the shape.
2. **Simple Scatter**: Create a scatter plot of `bill_length_mm` (x) vs `bill_depth_mm` (y).
3. **Adding Dimensions**: 
   - Map `species` to `hue`.
   - Map `island` to `style`.
4. **Relplot**: Recreate the plot above using `sns.relplot()` and split the plots by `sex` (using `col="sex"`).

### Solution Template

In [None]:
# Load data
penguins = sns.load_dataset('penguins')

# 1. Check Data
# TODO: print head and shape

# 2 & 3. Scatter with Hue and Style
plt.figure(figsize=(10, 6))
# TODO: sns.scatterplot(...)
plt.show()

# 4. Relplot with columns
# TODO: sns.relplot(...)
