## **PCA** 🚀😊  

---

## **📌 Step 1: Import Required Libraries**
```python
import numpy as np  # 🔢 For numerical computations
import pandas as pd  # 🏷️ Handling datasets
import matplotlib.pyplot as plt  # 📊 For visualization
from mpl_toolkits.mplot3d import Axes3D  # 🌍 3D plot
```
### **🔹 Explanation:**
- **`numpy`**: Used for numerical operations like matrix manipulations.  
- **`pandas`**: Used for handling datasets in a tabular format.  
- **`matplotlib.pyplot`**: Used to plot graphs.  
- **`mpl_toolkits.mplot3d`**: Required for 3D visualizations.  

---

## **📌 Step 2: Define the Dataset**
```python
data = np.array([
    [2.5, 2.4, 3.5],  
    [0.5, 0.7, 2.2],  
    [2.2, 2.9, 3.1],  
    [1.9, 2.2, 3.8],  
    [3.1, 3.0, 3.3]  
])

df = pd.DataFrame(data, columns=["Feature 1", "Feature 2", "Feature 3"])
print("Dataset:\n", df)
```
### **🔹 Explanation:**
- **`data`**: This is a 5×3 matrix where each row is a data point, and each column is a feature.  
- **`pd.DataFrame(data, columns=[...])`**: Converts the numpy array into a Pandas DataFrame with proper column names.  
- **`print(df)`**: Displays the dataset in a tabular format.  

---

## **📌 Step 3: Standardizing the Data**
```python
mean_vector = np.mean(data, axis=0)  # Compute mean of each feature
standardized_data = data - mean_vector  # Center the data

print("\nStandardized Data:\n", standardized_data)
```
### **🔹 Explanation:**
- **Why Standardize?** 🤔  
  PCA works best when data is **centered** (mean = 0), so we subtract the mean of each column from all values in that column.  

- **`np.mean(data, axis=0)`**: Computes the **mean of each column (feature-wise mean).**  
- **`data - mean_vector`**: Subtracts the mean from the dataset to center it around **zero mean**.  

---

## **📌 Step 4: Compute Covariance Matrix & Eigenvalues/Vectors**
```python
cov_matrix = np.cov(standardized_data.T)
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)

print("\nCovariance Matrix:\n", cov_matrix)
print("\nEigenvalues:\n", eigenvalues)
print("\nEigenvectors:\n", eigenvectors)
```
### **🔹 Explanation:**
- **What is the covariance matrix?** 📏  
  - It represents how two features vary together.  
  - Large values mean a **strong relationship** between features.  

- **`np.cov(standardized_data.T)`**:  
  - **`.T`** transposes the matrix (converts rows to columns).  
  - `np.cov()` computes the covariance between features.  

- **`np.linalg.eig(cov_matrix)`**:  
  - Finds **eigenvalues** (how much variance a direction captures).  
  - Finds **eigenvectors** (directions of new principal components).  

---

## **📌 Step 5: Project Data onto Principal Component**
```python
top_eigenvector = eigenvectors[:, np.argmax(eigenvalues)]  # Select the top eigenvector
projected_data = standardized_data @ top_eigenvector  # Project onto principal component

print("\nProjected Data:\n", projected_data)
```
### **🔹 Explanation:**
- **Why do we project data?** 🔄  
  - Instead of using 3D data, we **reduce the dimensions** while keeping the most important information.  

- **`np.argmax(eigenvalues)`**:  
  - Finds the **index of the largest eigenvalue** (most important principal component).  

- **`eigenvectors[:, np.argmax(eigenvalues)]`**:  
  - Selects the **corresponding eigenvector** (direction of maximum variance).  

- **`standardized_data @ top_eigenvector`**:  
  - **Matrix multiplication** `@` projects data onto the new principal component.  

---

## **📌 Step 6: Visualizing the Data**
### **🔹 3D Scatter Plot of Original Data**
```python
fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111, projection='3d')

ax.scatter(standardized_data[:, 0], standardized_data[:, 1], standardized_data[:, 2], c='b', label='Original Data')

ax.set_xlabel("Feature 1")
ax.set_ylabel("Feature 2")
ax.set_zlabel("Feature 3")
ax.set_title("Original Data in 3D Space")
ax.legend()
plt.show()
```
### **🔹 Explanation:**
- **Creates a 3D plot of the standardized data.**  
- **`.scatter()`** plots blue dots representing original data.  
- **`ax.set_xlabel()`, `ax.set_ylabel()`, `ax.set_zlabel()`** add labels.  

---

### **🔹 2D Projection onto Principal Component**
```python
plt.figure(figsize=(8, 5))
plt.scatter(projected_data, np.zeros_like(projected_data), c='r', label="Projected Data")

plt.axhline(0, color='black', linewidth=0.5)  # Draw a horizontal line at 0
plt.xlabel("Principal Component 1")
plt.title("Data Projected onto First Principal Component")
plt.legend()
plt.show()
```
### **🔹 Explanation:**
- **Creates a 2D scatter plot where data is projected onto the first principal component.**  
- **Data is now reduced from 3D to 1D (X-axis only).**  
- **`np.zeros_like(projected_data)`** makes Y values **zero** to align data on a single line.  

---

## **🎯 Summary of Steps**
| **Step** | **What it Does?** |
|----------|------------------|
| 1️⃣ Import Libraries | Loads necessary Python libraries. |
| 2️⃣ Define Dataset | Creates a dataset with 3 features and 5 samples. |
| 3️⃣ Standardize Data | Centers data by subtracting mean. |
| 4️⃣ Compute Eigenvalues & Eigenvectors | Finds principal components using covariance matrix. |
| 5️⃣ Project Data | Reduces 3D data to 1D using top eigenvector. |
| 6️⃣ Visualize | Plots original and transformed data. |

---

## **📌 Key Takeaways**
✅ **PCA transforms high-dimensional data into a lower dimension while preserving variance.**  
✅ **Eigenvectors define the new axes of transformation.**  
✅ **Eigenvalues indicate the importance of each axis (higher = more variance explained).**  
✅ **Visualization helps understand how data transforms from 3D to 1D.**  

---
🚀😊