# ðŸ“˜ Seaborn Introduction

## ðŸ”§ 1. Setup and Imports

In [None]:

import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

# Load sample dataset
tips = sns.load_dataset("tips")
tips.head()


**Explanation:**
- The first step is to import the main libraries used for data analysis and visualization: Seaborn (for statistical graphics), Pandas (for data manipulation), and Matplotlib (for plotting).
- The `tips` dataset is loaded using Seaborn's built-in function. This dataset contains information about restaurant tips, including bill amount, tip, gender, smoking status, day, time, and party size.
- Displaying the first few rows with `tips.head()` helps you quickly inspect the structure, column names, and types of data available, which is essential for planning your analysis and visualizations.

## ðŸ“ˆ 2A. Scatter Plot

In [None]:

sns.scatterplot(x="total_bill", y="tip", data=tips)
plt.title("Total Bill vs Tip")
plt.show()


**Explanation:**
- This scatter plot visualizes the relationship between the total bill and the tip amount for each transaction in the dataset.
- Each point on the plot represents a single record, showing how much was spent and how much was tipped.
- By plotting these two variables, you can observe patterns such as whether higher bills tend to result in higher tips, and identify any outliers or clusters in the data.
- The positive correlation seen here suggests that as the total bill increases, the tip amount generally increases as well.

## ðŸ“ˆ 2B. Histogram with KDE

In [None]:

sns.histplot(tips["total_bill"], kde=True, bins=20)
plt.title("Distribution of Total Bill")
plt.show()


**Explanation:**
- This histogram displays the distribution of the `total_bill` values in the dataset, showing how frequently different bill amounts occur.
- The KDE (Kernel Density Estimate) curve overlays the histogram, providing a smoothed estimate of the data's probability density, which helps to visualize the underlying distribution shape.
- Using 20 bins divides the data into intervals, making it easier to see where most bills fall and to spot any skewness or multimodal patterns.
- This visualization is useful for understanding the typical bill size and identifying any unusual values or trends in spending.

## ðŸ“ˆ 2C. Box Plot

In [None]:

sns.boxplot(x="day", y="total_bill", data=tips)
plt.title("Total Bill by Day")
plt.show()


**Explanation:**
- The box plot compares the distribution of total bill amounts for each day of the week, allowing you to see differences in spending patterns across days.
- The box represents the interquartile range (IQR), which contains the middle 50% of the data, while the line inside the box marks the median value for each day.
- Whiskers extend to the smallest and largest values within 1.5 times the IQR from the quartiles, helping to identify the spread and variability of the data.
- Points outside the whiskers are considered outliers, which may indicate unusual spending behavior on certain days.
- This plot is valuable for detecting trends, comparing groups, and spotting days with higher or lower bills.

## ðŸ“ˆ 2D. Violin Plot

In [None]:

sns.violinplot(x="day", y="total_bill", data=tips)
plt.title("Violin Plot of Total Bill by Day")
plt.show()


**Explanation:**
- The violin plot visualizes the distribution and density of total bill amounts for each day, combining features of box plots and KDE plots.
- The width of each violin at a given value indicates the density of data points, making it easy to see where values are concentrated or sparse.
- The central box and line show the interquartile range and median, providing summary statistics similar to a box plot.
- This plot is especially useful for comparing both the spread and shape of the data across categories, revealing multimodal distributions or asymmetries that box plots alone might miss.
- It helps you understand not just the range, but also the underlying patterns in the data for each day.

## ðŸ“ˆ 2E. Bar Plot

In [None]:

sns.barplot(x="day", y="tip", data=tips, errorbar='sd')
plt.title("Average Tip by Day")
plt.show()


**Explanation:**
- This bar plot displays the average tip amount for each day of the week, helping you compare tipping behavior across different days.
- The height of each bar represents the mean tip value for that day, providing a quick visual summary of central tendency.
- Error bars show the standard deviation, indicating how much tip amounts vary from the average, which helps assess consistency and variability.
- This visualization is useful for identifying days with higher or lower average tips and understanding patterns in customer generosity or service quality.

## ðŸ“ˆ 2F. Count Plot

In [None]:

sns.countplot(x="day", data=tips)
plt.title("Number of Records per Day")
plt.show()


**Explanation:**
- The count plot shows the number of records (visits) for each day, making it easy to see which days are most and least frequent in the dataset.
- Each bar's height corresponds to the count of occurrences for that day, providing a clear view of the distribution of categorical data.
- This plot is useful for detecting imbalances, such as days with more or fewer entries, which can affect analysis and interpretation.
- Understanding the frequency of each category helps ensure that comparisons between groups are fair and meaningful.

## ðŸ“ˆ 2G. Pair Plot

In [None]:

sns.pairplot(tips, hue="sex")
plt.suptitle("Pairwise Plots by Sex", y=1.02)
plt.show()


**Explanation:**
- The pair plot generates scatterplots and histograms for all pairs of numerical columns in the dataset, providing a comprehensive overview of relationships between variables.
- Coloring the points by the 'sex' column allows you to compare distributions and trends between male and female customers.
- This visualization helps identify correlations, clusters, and outliers, as well as differences in behavior between groups.
- Pair plots are especially valuable for exploratory data analysis, revealing patterns that may warrant further investigation or modeling.

## ðŸŽ¨ 3. Set Seaborn Theme

In [None]:

sns.set_theme(style="darkgrid")


**Explanation:**
- Setting the Seaborn theme to 'darkgrid' applies a consistent and visually appealing style to all plots in the notebook.
- The 'darkgrid' theme adds grid lines and a dark background, which can improve readability and make patterns in the data stand out.
- Using a theme ensures that your visualizations are professional and easy to interpret, especially when sharing results with others or presenting findings.

## ðŸ§© 4. Box Plot by Gender and Smoker

In [None]:

sns.boxplot(x="sex", y="tip", hue="smoker", data=tips)
plt.title("Tip by Gender and Smoking Status")
plt.show()


**Explanation:**
- This grouped box plot compares tip amounts across gender and smoking status, allowing you to analyze how these two categorical variables interact to affect tipping behavior.
- Each box represents the distribution of tips for a specific gender and smoker/non-smoker group, showing median, interquartile range, and outliers.
- By visualizing these groups together, you can detect differences in tipping patterns, such as whether smokers tip more or less than non-smokers, and whether gender influences tipping.
- This plot is useful for uncovering complex relationships and informing further statistical analysis or business decisions.

## ðŸ§  Summary Table


| Function         | Purpose                                       |
|------------------|-----------------------------------------------|
| `scatterplot`    | Relationship between two numerical variables  |
| `histplot`       | Distribution of a single numerical variable   |
| `boxplot`        | Summary stats and outliers by category        |
| `violinplot`     | Distribution + density                        |
| `barplot`        | Mean and error bars for categories            |
| `countplot`      | Frequency counts for categories               |
| `pairplot`       | Scatterplots for multiple variable pairs      |
