# Visualization with Seaborn

Matplotlib is a powerful library, but it can be quite verbose. Seaborn is a library built on top of Matplotlib that provides a high-level interface for drawing attractive and informative statistical graphics.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# Import Seaborn
import seaborn as sns
sns.set_theme(style="whitegrid") # Set a nice default style

## Seaborn Versus Matplotlib

Seaborn's main goal is to make visualization a central part of exploring and understanding data. It provides dataset-oriented APIs so that you can focus on what the variables mean, rather than on the details of how to draw them.

For simple plots, Seaborn is often much less verbose than Matplotlib.

## Exploring Seaborn Plots

Let's look at a few of Seaborn's most useful plot types.

### Histograms and KDEs
Seaborn makes it easy to create histograms and kernel density estimates (KDEs) to visualize distributions.

In [None]:
# Create some sample data
data = np.random.multivariate_normal([0, 0], [[5, 2], [2, 2]], size=2000)
df = pd.DataFrame(data, columns=['x', 'y'])

# Plot a histogram and KDE for each variable
for col in 'xy':
    sns.histplot(df[col], kde=True)
    plt.show()

### Pair Plots
The `pairplot` is a fantastic tool for visualizing the relationships between all pairs of variables in a dataset.

In [None]:
# Load the iris dataset from seaborn
iris = sns.load_dataset("iris")

# Create a pairplot
sns.pairplot(iris, hue='species', height=2.5)
plt.show()

### Categorical Plots
Seaborn excels at visualizing categorical data. Box plots and violin plots are two great examples.

In [None]:
# Load the tips dataset
tips = sns.load_dataset("tips")

# Create a violin plot to show the distribution of total_bill for each day
sns.violinplot(x="day", y="total_bill", data=tips)
plt.show()

### Regression Plots
The `lmplot` function can be used to visualize a linear regression between two variables.

In [None]:
sns.lmplot(x="total_bill", y="tip", data=tips)
plt.show()

## Example: Exploring Marathon Finishing Times

Let's use Seaborn to explore a synthetic dataset of marathon finishing times.

In [None]:
# Create a synthetic dataset
rng = np.random.RandomState(0)
age = rng.randint(18, 65, size=100)

# Simulate that finish time increases slightly with age
# and men are slightly faster on average
finish_time_hours = 4.5 + (age - 18) * 0.01 + rng.randn(100) * 0.2
finish_time_hours[finish_time_hours < 2.5] = 2.5 # No one is that fast

gender = np.random.choice(['Male', 'Female'], size=100, p=[0.6, 0.4])
finish_time_hours[gender == 'Male'] -= 0.2

marathon_df = pd.DataFrame({'age': age, 'gender': gender, 'time_hours': finish_time_hours})

We can use a violin plot to compare the distribution of finishing times between genders.

In [None]:
sns.violinplot(x="gender", y="time_hours", data=marathon_df)
plt.show()

We can use `lmplot` to see the relationship between age and finishing time, separated by gender.

In [None]:
sns.lmplot(x="age", y="time_hours", hue="gender", data=marathon_df)
plt.show()