# Introduction to Seaborn

**Seaborn** is a powerful Python library for data visualization built on top of Matplotlib. 

While Matplotlib is great for creating basic plots, Seaborn makes it easier to create beautiful, informative visualizations with less effort. Since you already have some experience with Matplotlib, this will be a smooth transition and we'll dive into some key functionalities Seaborn offers.

**Key topics**

1. What is Seaborn and why use it?
2. Seaborn vs Matplotlib
3. Getting Started with Seaborn
4. Built-in Seaborn Datasets
5. Seaborn's Core Plots:
   - Scatter Plots
   - Line Plots
   - Bar Plots 
   - Box Plots 
   - Violin Plots    
   - Pair Plot
   - Regression Plots
   - Histogram Plots
6. Customizing Plots (Themes, Colors, Styles)
7. Tips for Combining Seaborn with Matplotlib


___

# 1. What is Seaborn and Why Use It?

**Seaborn** is built on top of Matplotlib and is designed to make data visualization easier and more informative. Some reasons why Seaborn is worth using:

- **Simplifies complex visualizations**: Seaborn automates tasks like setting labels, managing plot aesthetics, and displaying statistical relationships.
- **Integrates well with Pandas**: Since Seaborn is designed to work with Pandas DataFrames, plotting from DataFrames becomes straightforward.
- **Prettier default aesthetics**: Seaborn's visual style is cleaner and more modern by default, which can save time on customization.


___

# 2. Seaborn vs Matplotlib

Seaborn is **not a replacement for Matplotlib** but an extension. Here's a basic comparison:

| Feature                    | Matplotlib                                        | Seaborn                                              |
|----------------------------|--------------------------------------------------|------------------------------------------------------|
| Default plot aesthetics     | Basic and customizable                           | Pretty, clean, and minimal customization needed       |
| Ease of creating complex plots | Can be verbose and code-heavy                    | Simplified, often one-liners                         |
| Statistical plots           | Requires manual effort                           | Built-in support for common statistical visualizations|
| Data handling               | Works well with arrays and lists                 | Optimized for Pandas DataFrames                      |


____

# 3. Getting Started with Seaborn


Make sure you have Seaborn installed. If you don't have it, install it via:


    pip install seaborn


In [1]:
# necessary libraries

import numpy as np
import pandas as pd

import seaborn as sns
import matplotlib.pyplot as plt

____

# 4. Built-in Seaborn Datasets

Seaborn comes with several built-in datasets, which you can load directly into a Pandas DataFrame. One popular dataset is the `tips` dataset, which contains data about the total bill, tips, gender, and other features of restaurant bills.

Let’s load and inspect the `tips` dataset.


In [None]:
# Code: Load the 'tips' dataset
tips_df = sns.load_dataset('tips')

# Display the first 5 rows
tips_df.head()


In [None]:
tips_df['size'].value_counts()

____
# 5. Seaborn's Core Plots
## Scatter Plots

A scatter plot shows the relationship between two continuous variables. Seaborn’s `scatterplot()` provides a great interface to create them.


In [None]:
# Code: Scatter plot: Total Bill vs Tip
sns.scatterplot(x='total_bill', y='tip', data=tips_df)
plt.show()


**Adding categorical differentiation:**

You can differentiate points based on categories like gender.


In [None]:
# Scatter plot with hue (sex)
# note that this adds another dimension to our visualisations,
# allowing for more information to be conveyed in a single plot

sns.scatterplot(x='total_bill', y='tip', hue='sex', data=tips_df)
plt.show()


In [None]:
sns.scatterplot(x='total_bill', y='tip', hue='size', data=tips_df)
plt.show()

## Line Plots

Line plots are typically used for visualizing trends over time. While Matplotlib has `plt.plot()`, Seaborn offers more flexibility and integrates better with data structures like Pandas.


In [None]:
# load another sample dataset

flights_df = sns.load_dataset('flights')

flights_df.head()


In [None]:
# line plot showing the number of passengers over time
sns.lineplot(x='year', y='passengers', data=flights_df)
plt.show()

**Grouping by a category**

Seaborn allows you to plot multiple lines, each representing a different category.


In [None]:
# line plot with hue (month)
# splits and plots a seperate line for each month

sns.lineplot(x='year', y='passengers', hue='month', data=flights_df)
plt.show()


## Bar Plots 

Bar plots display the relationship between a categorical variable and a numeric variable. `barplot()` will calculate the mean of the numeric variable for each category by default.


In [None]:
#Bar plot showing average total bill by day

sns.barplot(x='day', y='total_bill', data=tips_df)
plt.show()


## Box Plots
A box plot is great for visualizing the distribution of a numeric variable. It shows the median, interquartile range, and potential outliers.


In [None]:
# Box plot: Total Bill vs Day

sns.boxplot(x='day', y='total_bill', data=tips_df)
plt.show()


**Box plots grouped by another variable:**

You can group the data further using a `hue`.


In [None]:
# box plot with hue (sex)

sns.boxplot(x='day', y='total_bill', hue='sex', data=tips_df)
plt.show()


## Violin Plots

A violin plot combines aspects of a box plot and a kernel density plot. It provides insight into the distribution and density of the data.


In [None]:
# Violin plot showing distribution of total bill by day

sns.violinplot(x='day', y='total_bill', data=tips_df)
plt.show()


**Violin plots can be grouped usingign `hue`**


In [None]:
# Violin plot with hue (sex)

sns.violinplot(x='day', y='total_bill', hue='sex', data=tips_df)
plt.show()


## Pair Plots

A `pairplot()` creates a grid of scatter plots (and histograms for univariate distributions) for each pair of features in the dataset.

In [None]:
sns.pairplot(tips_df)
plt.show()


You can also specify which variables to include, or add `hue` to color-code by a category.


In [None]:
sns.pairplot(tips_df, hue='sex')
plt.show()

## Regression Plots
Regression plots show the linear relationship between two variables. The `regplot()` automatically fits and plots a regression line.


In [None]:
# Regression plot: Total Bill vs Tip

sns.regplot(x='total_bill', y='tip', data=tips_df)
plt.show()


## Histograms

Seaborn provides a powerful histogram function with flexible binning


In [None]:
# Histogram of total bills

sns.histplot(data=tips_df, x='total_bill', kde=True, bins=30)
plt.show()

As per usual, we can use 'hue' to group or data

In [None]:
# Histogram of total bills

sns.histplot(data=tips_df, x='total_bill', kde=True, bins=30, hue='sex')
plt.show()



## KDE Plot

The Kernel Density Estimate plot shows the estimated probability density function of a continuous variable.


In [None]:
# KDE plot of total bill

sns.kdeplot(data=tips_df, x='total_bill')
plt.show()


In [None]:
sns.kdeplot(data=tips_df, x='total_bill', hue='sex')
plt.show()

____

# 6. Customizing Plots (Themes, Colors, Styles)

Seaborn makes it easy to customize the appearance of plots using themes and color palettes.

## Setting a Theme

You can choose from different themes such as `darkgrid`, `whitegrid`, `dark`, `white`, and `ticks`.


In [None]:
# Code: Set the theme to 'darkgrid'
sns.set_theme(style='whitegrid')

# Example plot
sns.scatterplot(x='total_bill', y='tip', data=tips_df)
plt.show()


## Color Palettes

Seaborn allows you to choose from built-in color palettes, or you can create your own custom palette.


In [None]:
# Set a color palette
sns.set_palette('pastel')  # Use e.g., 'bright', 'deep', 'muted', 'pastel', 'dark', or 'colorblind'

# Example plot with pastel colors

sns.histplot(data=tips_df, x='total_bill', kde=True, bins=30)
plt.show()



____

# 7. Combining Seaborn with Matplotlib

Since Seaborn is built on top of Matplotlib, you can combine Seaborn's high-level plotting with Matplotlib's customization capabilities. For example, you can modify titles, axes, or legends with Matplotlib commands.


In [None]:
# Seaborn plot with Matplotlib customization
sns.scatterplot(x='total_bill', y='tip', hue='sex', data=tips_df)

plt.title('Scatter Plot of Total Bill vs Tip by Gender')
plt.xlabel('Total Bill ($)')
plt.ylabel('Tip ($)')
plt.show()
