# Seaborn Cheat Sheet for Hackathons

Seaborn is a powerful Python library for creating attractive and informative statistical graphics. It is built on top of Matplotlib and integrates closely with Pandas data structures. This notebook provides a quick reference to its most common plotting functions.

## 1. Setup and Data Loading

Import necessary libraries and load a sample dataset. Seaborn comes with several built-in datasets.

In [None]:
# Import seaborn with its conventional alias 'sns'
import seaborn as sns
# Import matplotlib for plot display and customization
import matplotlib.pyplot as plt
# Import pandas for data manipulation
import pandas as pd

# Set the visual theme for all subsequent plots
sns.set_theme(style="whitegrid")

# Load the built-in 'tips' dataset
tips = sns.load_dataset("tips")

# Display the first few rows of the dataset
print("Sample 'tips' dataset:")
print(tips.head())

## 2. Relational Plots

These plots are used to understand the relationship between two variables.

### Scatter Plot
Shows the relationship between two numeric variables. The `hue` parameter can add a third dimension.

In [None]:
# Create a scatter plot
# x: The variable for the x-axis
# y: The variable for the y-axis
# hue: A variable to color the points by
# data: The DataFrame to use
plt.figure(figsize=(8, 6))
sns.scatterplot(data=tips, x="total_bill", y="tip", hue="smoker")
plt.title('Total Bill vs. Tip by Smoker')
plt.show()

## 3. Distribution Plots

These plots help visualize the distribution of a single variable.

### Histogram
Shows the frequency distribution of a variable. Adding a Kernel Density Estimate (`kde`) can show the shape.

In [None]:
# Create a histogram of the total_bill
# kde=True adds a line showing the estimated probability density
sns.histplot(data=tips, x="total_bill", kde=True)
plt.title('Distribution of Total Bill')
plt.show()

## 4. Categorical Plots

These plots show the relationship between a numeric variable and one or more categorical variables.

### Box Plot
A classic plot showing the five-number summary of a variable (min, first quartile, median, third quartile, max).

In [None]:
# Create a box plot
# Compares the distribution of 'total_bill' across different 'day's
sns.boxplot(data=tips, x="day", y="total_bill")
plt.title('Distribution of Total Bill by Day')
plt.show()

### Violin Plot
A combination of a box plot and a kernel density estimate. It shows more about the distribution shape.

In [None]:
# Create a violin plot
# 'hue' splits the violins for comparison
sns.violinplot(data=tips, x="day", y="total_bill", hue="smoker", split=True)
plt.title('Distribution of Total Bill by Day and Smoker')
plt.show()

### Count Plot
Shows the number of occurrences of each category in a categorical variable (essentially a histogram for categories).

In [None]:
# Create a count plot
# Shows how many entries exist for each day
sns.countplot(data=tips, x="day", hue="sex")
plt.title('Count of Records by Day and Sex')
plt.show()

## 5. Matrix Plots

### Heatmap
Visualizes 2D data, like a correlation matrix, using colors.

In [None]:
# Select only numeric columns for correlation
numeric_tips = tips.select_dtypes(include=np.number)
# Calculate the correlation matrix
correlation_matrix = numeric_tips.corr()

# Create a heatmap
# annot=True writes the data value in each cell
# cmap sets the color map
plt.figure(figsize=(8, 6))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f')
plt.title('Correlation Matrix of Numeric Tip Data')
plt.show()

## 6. Multi-plot Grids

### Pair Plot
Creates a grid of scatterplots for each pair of numeric variables and histograms on the diagonal. It's a fantastic way to get a quick overview of your data.

In [None]:
# Create a pair plot
# 'hue' colors the plots by a categorical variable
sns.pairplot(tips, hue="smoker")
plt.suptitle('Pair Plot of Tips Data by Smoker', y=1.02) # Adjust title position
plt.show()