# 📊 Data Visualization with Matplotlib and Seaborn

This notebook covers advanced data visualization techniques using Python libraries: Matplotlib and Seaborn.

We'll explore the `diamonds` dataset from Seaborn.

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Load the dataset
diamonds = sns.load_dataset('diamonds')
diamonds.head()

## 📈 Distribution of Diamond Prices
Let's look at the distribution of the `price` column using histogram and KDE.

In [None]:
plt.figure(figsize=(10,6))
sns.histplot(diamonds['price'], kde=True, color='skyblue')
plt.title('Distribution of Diamond Prices')
plt.xlabel('Price')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()

What does it show?

- The histogram groups prices into "bins" and counts the number of diamonds per price range.

- The KDE (Kernel Density Estimation) line smooths the distribution to show general trends.

Why is it useful?

- You identify price biases (for example, many cheap diamonds and few very expensive ones).

- It helps you detect outliers.

## 📦 Boxplot of Price by Cut
This shows how the price varies by the cut quality.

In [None]:
plt.figure(figsize=(10,6))
sns.boxplot(x='cut', y='price', data=diamonds)
plt.title('Diamond Price by Cut Quality')
plt.xlabel('Cut')
plt.ylabel('Price')
plt.grid(True)
plt.show()

What does it show?

- Price comparison by cut quality.

- Shows median, quartiles, and outliers by category.

Why is it useful?

- Detects whether cut quality affects price.

- Identifies variability and the presence of outliers.

## 🎯 Scatterplot: Carat vs Price
Visualize the relationship between carat and price.

In [None]:
plt.figure(figsize=(10,6))
sns.scatterplot(x='carat', y='price', hue='cut', data=diamonds, alpha=0.6)
plt.title('Carat vs Price by Cut')
plt.xlabel('Carat')
plt.ylabel('Price')
plt.grid(True)
plt.legend(title='Cut')
plt.show()

## 🔥 Correlation Heatmap
Explore correlations between numeric variables.

In [None]:
plt.figure(figsize=(8,6))
corr = diamonds[['carat', 'depth', 'table', 'price', 'x', 'y', 'z']].corr()
sns.heatmap(corr, annot=True, cmap='coolwarm', fmt='.2f')
plt.title('Correlation Heatmap of Diamond Features')
plt.show()

## 📌 Histogram of carat values

In [None]:
plt.figure(figsize=(10,6))
sns.histplot(diamonds['carat'], bins=30, kde=True, color='green')
plt.title('Distribution of Diamond Carat')
plt.xlabel('Carat')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()

## 📌 Violinplot of price by color

In [None]:
plt.figure(figsize=(10,6))
sns.violinplot(x='color', y='price', data=diamonds, palette='muted')
plt.title('Violinplot of Diamond Price by Color')
plt.xlabel('Color Grade')
plt.ylabel('Price ($)')
plt.grid(True)
plt.show()

## 📌 Bar chart of count of diamonds by clarity

In [None]:
plt.figure(figsize=(10,6))
clarity_counts = diamonds['clarity'].value_counts().sort_index()
sns.barplot(x=clarity_counts.index, y=clarity_counts.values, palette='deep')
plt.title('Number of Diamonds by Clarity')
plt.xlabel('Clarity Grade')
plt.ylabel('Count')
plt.grid(True)
plt.show()