In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv(r"C:\Basic_Datascience_4ML\assets\data\tips.csv")
df.head()
df.sample(20)

📌 Categorical Data Plots – Theory

Categorical plots are used to visualize the distribution and relationship of **categorical variables** with counts or numerical values. They are essential for exploring patterns, class balance, and group comparisons.

## Types of Categorical Plots

1. **Countplot**
   - Displays frequency of each category.
   - Example: Number of male vs female passengers.

2. **Barplot**
   - Shows an aggregate (mean/median/sum) of a numerical variable across categories.
   - Example: Average fare per passenger class. take x= quantitative , y = qualitative

3. **Boxplot**
   - Summarizes distribution using median, quartiles, and outliers for each category.
   - Example: Age distribution by survival status.
   - Boxplot detailed in ``` data prepare ```

4. **Violin Plot**
   - Combines boxplot with KDE to show both summary statistics and distribution density.
   - Example: Salary distribution by job role.

5. **Stripplot**
   - Displays individual observations per category (with jitter).
   - Best for small datasets.
   - stripplot will draw a scatterplot where one variable is categorical

6. **Swarmplot**
   - Similar to stripplot but adjusts points to avoid overlap.
   - Example: Test scores grouped by section.

## When to Use
- **Countplot** → Frequency of categories.
- **Barplot** → Compare numeric values across categories.
- **Box/Violin** → Understand spread, distribution, and outliers.
- **Strip/Swarm** → Visualize individual data points within categories.


In [None]:
sns.countplot(data=df,x="sex",hue="smoker",palette="Set2")

In [None]:
sns.barplot(data=df,x='sex',y="tip",estimator=np.sum)

In [None]:
sns.boxplot(data=df,x='tip',y='day')

In [None]:
sns.violinplot(data=df,x='day',y='tip',palette='Set2',hue='sex')

In [None]:
sns.stripplot(data=df,x='day',y='tip',hue='sex',palette='Set2')

In [None]:
sns.swarmplot(data=df,x='day',y='tip',hue='sex',palette='Set2')

In [None]:
sns.violinplot(data=df,x='day',y='tip',hue='sex',palette='Set2')
sns.stripplot(data=df,x='day',y='tip',hue='sex',palette='Set2')

 Matrix Plots – Theory

A **matrix plot** visualizes data stored in a rectangular 2D matrix form (rows × columns). It helps in identifying patterns, correlations, and clusters across variables. Each cell in the matrix represents a value and is usually color-coded.

## Common Types of Matrix Plots

1. **Heatmap**
   - Shows data values as color-shaded cells.
   - Useful for correlation matrices or frequency tables.
   - Example: Visualizing correlation between features in a dataset.

2. **Cluster Map**
   - Heatmap with hierarchical clustering (rows and columns are reordered based on similarity).
   - Useful for detecting groups and patterns.
   - Example: Gene expression data analysis.

3. **Pivot Table Heatmap**
   - Matrix derived from categorical grouping and aggregation, visualized as a heatmap.
   - Example: Sales across regions and months.

## When to Use Matrix Plots
- To explore **relationships between multiple variables**.
- To detect **correlations** (positive/negative).
- To identify **clusters, similarities, or anomalies**.
- To visualize **large tabular data** in a compact, interpretable way.


In [None]:
# Correlation Heatmap (only numeric columns)
sns.heatmap(df.select_dtypes(include='number').corr(), annot=True, cmap="coolwarm")
plt.show()

# Cluster Map
sns.clustermap(df.select_dtypes(include='number').corr(), annot=True, cmap="viridis")
plt.show()

# Pivot Heatmap (Average Tip by Day and Gender)
sns.heatmap(
    df.pivot_table(values="tip", index="day", columns="sex", aggfunc="mean"),
    annot=True, cmap="YlGnBu"
)
plt.show()
