# Seaborn for AI Engineers (2025): A Q&A Tutorial
Your guide to mastering data visualization, one question at a time.

Welcome to your one-week journey to becoming proficient in Seaborn! This notebook is structured in a question-and-answer format to help you learn incrementally. Each section builds on the last, starting from the absolute basics and moving toward advanced applications relevant to AI and Machine Learning.

Let's get started!

## 1. Getting Started: Setup and Imports
The first step in any Python project is setting up your environment and importing the necessary libraries.

Question: Write a single line of code to import the Seaborn library. What is the standard alias (nickname) used by convention?

In [None]:
# The standard convention is to import seaborn as 'sns'
import seaborn as sns

Question: Seaborn is built on top of Matplotlib. How do you import the pyplot module from Matplotlib, and why do we use the %matplotlib inline command in Jupyter Notebooks?

In [None]:
# We import pyplot for further plot customization
import matplotlib.pyplot as plt

# This 'magic' command ensures that plots are rendered directly within the notebook
%matplotlib inline

Question: How can you set a default theme for all your plots to ensure a consistent and professional look?

In [None]:
# sns.set_theme() applies a default theme. 'style' is one of several parameters.
# 'whitegrid' is a popular choice for its clean look.
sns.set_theme(style="whitegrid")

## 2. Basic Plots: Visualizing a Single Variable
Let's start by looking at the distribution of data for a single variable.

Question: How do you create a simple histogram to see the frequency distribution of a list of numbers?

In [None]:
# Let's define some sample data first
data = [1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5, 5, 6, 6, 7]

# sns.histplot() is used to create a histogram.
# 'bins' controls how many bars the data is grouped into.
sns.histplot(data=data, bins=5)

plt.title('Simple Histogram')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()

Question: How can you add a Kernel Density Estimate (KDE) line to the histogram to better visualize the shape of the distribution?

In [None]:
# The 'kde=True' parameter automatically calculates and plots the density curve.
sns.histplot(data=data, kde=True, bins=5)

plt.title('Histogram with KDE')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()

## 3. Statistical Plots: Exploring Relationships
Now, let's explore how two different variables relate to each other.

Question: How do you load one of Seaborn's built-in sample datasets, for example, the 'tips' dataset?

In [None]:
# Seaborn comes with several useful datasets for practice.
tips = sns.load_dataset('tips')

# Let's preview the first 5 rows to understand its structure
tips.head()

Question: How can you visualize the relationship between two numerical variables ('total_bill' and 'tip') and also see the regression line that fits this data?

In [None]:
# sns.lmplot() creates a scatter plot and fits a linear regression model.
sns.lmplot(x='total_bill', y='tip', data=tips, height=6, aspect=1.5)

plt.title('Total Bill vs Tip with Regression Line')
plt.show()

Question: How can you create a box plot to show the distribution of 'total_bill' for each 'day' of the week?

In [None]:
# A box plot is excellent for comparing distributions across categories.
# It shows the median, quartiles, and potential outliers.
sns.boxplot(x='day', y='total_bill', data=tips)

plt.title('Daily Bill Distribution')
plt.show()

## 4. Customizing Plots with Seaborn
Make your visualizations more informative and visually appealing by customizing them.

Question: How can you customize a plot's appearance by changing its style and color palette?

In [None]:
# You can set the style and palette globally for all subsequent plots.
sns.set_style("darkgrid")
sns.set_palette("husl") # 'husl' is a palette good for categorical data.

# Let's create a bar plot to see the effect.
sns.barplot(x='day', y='total_bill', data=tips)

plt.title('Daily Bill Averages with Custom Style')
plt.show()

# Let's reset to our default for the rest of the tutorial
sns.set_theme(style="whitegrid", palette="deep")

Question: How can you customize a single plot by mapping data variables to aesthetic properties like color (hue) and size?

In [None]:
# sns.relplot is great for relational plots (like scatter plots).
# hue: colors points based on a categorical variable ('size' of the party).
# size: changes the size of points based on a numerical or categorical variable.
# palette: specifies the color map to use for the 'hue' variable.
# sizes: specifies the min and max size of the points.
scatter = sns.relplot(
    x='total_bill', 
    y='tip', 
    data=tips,
    height=6,
    aspect=1.2,
    hue='size',        # Color by party size
    palette='viridis', # Use the 'viridis' color palette
    size='size',       # Vary point size by party size
    sizes=(20, 200)    # Range of point sizes
)

scatter.fig.suptitle('Customized Scatter Plot', y=1.03)
plt.show()

## 5. Handling and Visualizing Categorical Data
Categorical data is common in AI/ML. Let's see how to visualize it effectively.

Question: How can you use catplot to create a violin plot that shows the distribution of 'total_bill' by 'day', further split by 'sex'?

In [None]:
# sns.catplot is a versatile function for plotting categorical data.
# kind='violin': shows the distribution shape (like a KDE) and a boxplot inside.
# hue='sex': creates separate violins for 'Male' and 'Female'.
# split=True: combines the 'hue' violins into a single, split violin for easier comparison.
sns.catplot(
    x='day', 
    y='total_bill', 
    hue='sex',
    data=tips, 
    kind='violin',
    split=True,
    height=6,
    aspect=1.5
)

plt.title('Bill Distribution by Day and Gender')
plt.show()

## 6. Creating Multi-Plot Grids
Sometimes you need to create many similar plots for different subsets of your data. FacetGrid makes this easy.

Question: How can you create a grid of scatter plots showing the 'total_bill' vs 'tip' relationship, but with separate plots for each combination of 'time' (Lunch/Dinner) and 'smoker' (Yes/No)?

In [None]:
# 1. Initialize a FacetGrid, specifying the data and the variables for rows and columns.
grid = sns.FacetGrid(tips, col='time', row='smoker', height=4, aspect=1.2)

# 2. Use the .map() method to apply a plotting function to each subset of the data.
grid.map(sns.scatterplot, 'total_bill', 'tip')

grid.fig.suptitle('Multi-Plot Grid by Time and Smoking Status', y=1.03)
plt.show()

## 7. PairPlots and Heatmaps: The Big Picture
These plots are essential for quickly exploring all relationships in a dataset.

Question: How can you visualize the pairwise relationships between all numerical variables in a dataset at once? Let's use the 'iris' dataset.

In [None]:
# First, load the iris dataset
iris = sns.load_dataset('iris')

# sns.pairplot creates a grid where diagonals are histograms of each variable,
# and off-diagonals are scatter plots of each pair of variables.
# hue='species' colors the plots by the flower species.
sns.pairplot(iris, hue='species', height=2.5)

plt.suptitle('Pairwise Relationships in Iris Dataset', y=1.02)
plt.show()

Question: How do you compute a correlation matrix and visualize it as a heatmap to easily identify which variables are strongly correlated?

In [None]:
# 1. Calculate the correlation matrix for numeric columns only.
corr = iris.corr(numeric_only=True)

# 2. Use sns.heatmap to visualize the matrix.
# annot=True: writes the correlation value in each cell.
# cmap='coolwarm': uses a color map where positive correlations are warm (red)
# and negative correlations are cool (blue).
plt.figure(figsize=(8, 6))
sns.heatmap(corr, annot=True, cmap='coolwarm')

plt.title('Correlation Heatmap')
plt.show()

## 8. Integrating Seaborn with Matplotlib
Combine the power of Seaborn's high-level functions with Matplotlib's customization capabilities.

Question: How can you create a figure with two subplots side-by-side, placing a Seaborn plot on the left axis and a standard Matplotlib plot on the right?

In [None]:
# 1. Create a figure and a set of subplots with Matplotlib.
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

# 2. Create the Seaborn plot, specifying the target axis with the 'ax' parameter.
sns.scatterplot(x='total_bill', y='tip', data=tips, ax=ax1)
ax1.set_title('Seaborn Scatter Plot')

# 3. Create the Matplotlib plot on the other axis.
ax2.hist(tips['total_bill'], bins=15, color='skyblue', edgecolor='black')
ax2.set_title('Matplotlib Histogram')

plt.tight_layout() # Adjusts plot parameters for a tight layout
plt.show()

## 9. Advanced Visualization for AI/ML
Let's apply our skills to common tasks in a machine learning workflow, like exploratory data analysis (EDA) and model evaluation.

Question: In EDA for a classification problem, how can you visualize and compare the distributions of multiple features at once? (Using the 'wine' dataset).

In [None]:
# First, we need more libraries, including one for dataframes and the dataset
import pandas as pd
from sklearn.datasets import load_wine

# Load and prepare the data
wine_data = load_wine()
wine_df = pd.DataFrame(wine_data.data, columns=wine_data.feature_names)

# A violin plot is great for comparing multiple distributions.
plt.figure(figsize=(12, 7))
sns.violinplot(data=wine_df.iloc[:, :5], inner="quartile", palette="Set3")
plt.title('Feature Distributions for Wine Dataset')
plt.xticks(rotation=45)
plt.show()

Question: After training a classifier, a confusion matrix is used to evaluate its performance. How can you use a Seaborn heatmap to create a clear and informative visualization of a confusion matrix?

In [None]:
# Import necessary scikit-learn modules
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

# 1. Prepare data and train a simple model
X_train, X_test, y_train, y_test = train_test_split(
    wine_data.data, wine_data.target, test_size=0.3, random_state=42)
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

# 2. Calculate the confusion matrix
cm = confusion_matrix(y_test, y_pred)

# 3. Plot the confusion matrix using a heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=wine_data.target_names, 
            yticklabels=wine_data.target_names)
plt.title('Confusion Matrix for Wine Classification')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.show()

## 10. Conclusion and Next Steps

Congratulations! You've just completed a comprehensive tour of the Seaborn library, from basic plots to advanced AI/ML applications.

### Key Concepts Covered:
- Setup: Importing seaborn as sns and setting themes.
- Basic Plots: Using histplot to understand distributions.
- Statistical Relationships: Using lmplot, boxplot, and catplot to compare variables.
- Customization: Modifying aesthetics with set_style, set_palette, and parameters like hue and size.
- Advanced Layouts: Creating multi-plot figures with FacetGrid.
- Matrix Plots: Getting a high-level overview with pairplot and heatmap.
- AI/ML Applications: Visualizing feature distributions and model performance (confusion matrix).

### Recommended Next Steps:
1.  Practice: Use the functions you learned here on a new dataset. Try the titanic or fmri datasets (sns.load_dataset('titanic')).
2.  Explore: Visit the official Seaborn example gallery for inspiration and code for many more plot types.
3.  Deepen Knowledge: Read the official documentation for functions that interest you to learn about all their parameters and capabilities.

Happy visualizing in 2025! 🚀