Introduction to visualization 


---

### 2.3.1. Visualization using Matplotlib

**Matplotlib** is the most commonly used library for data visualization in Python.  
It is especially important in **machine learning** to quickly explore and understand data.  

We will cover the most essential plots:
- Line Plot
- Scatter Plot
- Boxplot
- Bar Plot


In [None]:
# Importing ploting library
import matplotlib.pyplot as plt
#Importing numpy for numerical copmuting
import numpy as np


#### Line Plot

In [None]:
# Data
x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)

# Plot
plt.figure(figsize=(7,4))
plt.plot(x, y1, label="sin(x)", color="blue")
plt.plot(x, y2, label="cos(x)", color="orange", linestyle="--")

plt.title("Line Plot Example")
plt.xlabel("X values")
plt.ylabel("Y values")
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()


#### Scatter plot

In [None]:
# Random data
x = np.random.rand(100)
y = np.random.rand(100)

plt.figure(figsize=(7,4))
plt.scatter(x, y, color="red", marker="D")
plt.title("Scatter Plot Example")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.tight_layout()
plt.show()
#plt.savefig(path)

#### Box plot

In [None]:
# Random normally distributed data
data = np.random.randn(100)

plt.figure(figsize=(7,4))
plt.boxplot(data)
plt.title("Boxplot Example")
plt.xlabel("Data")
plt.ylabel("Values")
plt.tight_layout()
plt.show()


#### Bar Plot

In [None]:
categories = ["A", "B", "C", "D"]
values = [23, 45, 12, 36]

plt.figure(figsize=(7,4))
plt.bar(categories, values, color="skyblue")
plt.title("Bar Plot Example")
plt.xlabel("Categories")
plt.ylabel("Values")
plt.tight_layout()
plt.show()


### 2.3.1. Visualization using Seaborn

Seaborn is a powerful and beginner-friendly data visualization library built on top of Matplotlib.
It is designed to make it easier to create beautiful and informative statistical graphics.

Seaborn is especially helpful when you're working with pandas DataFrames, as it integrates directly with them and often requires less code than Matplotlib.

In [None]:
# Importing
import seaborn as sns # for statistical plotting
import matplotlib.pyplot as plt # for plotting

In [None]:
# Loading a dataset example
tips = sns.load_dataset("tips")
print(tips.head())

In [None]:
# Scatter plot with regression line between total_bill and tip
plt.figure(figsize=(7,4))
sns.scatterplot(data=tips, x="total_bill", y="tip", hue="time", style="smoker", s=100)
sns.regplot(data=tips, x="total_bill", y="tip", scatter=False, color="black")
plt.title("Scatter Plot with Regression Line")
plt.tight_layout()

In [None]:
# Plot how much smokers tips and non-smokers tips
plt.figure(figsize=(7,4))
sns.boxplot(x='day', y='total_bill', data=tips, palette='Set2')
plt.title("Boxplot of Total Bill by Day")
plt.tight_layout()
plt.show()


In [None]:
# Histogram
plt.figure(figsize=(7,4))
sns.histplot(tips['total_bill'], bins=20, kde=True, color='blue')
plt.title("Histogram of Total Bill")
plt.tight_layout()

In [None]:
# Pairplot
sns.pairplot(tips, hue='time', palette='coolwarm')
plt.suptitle("Pairplot of Tips Dataset", y=1.02)
plt.show()

In [None]:
# Try the plots with the sex attribute and the smoker attribute
