# 🌳 Decision Tree & Random Forest Summary

## **1. `rpart` (CART - Classification and Regression Trees)**
- Builds a **single** decision tree.
- Uses **all data** for training.
- Splits are based on **all available features** at each node.
- **Prone to overfitting** if the tree grows too deep.
- Can be pruned using **`printcp()`** and **`prune()`** to avoid overfitting.

### **Useful Functions:**
- `rpart()` → Builds a decision tree.
- `printcp(tree)` → Displays the complexity parameter (CP) table for pruning.
- `prp()` → Simplified tree plot.
- `rpart.plot()` → Fully customizable tree visualization.

---

## **2. `randomForest` (Random Forest)**
- Builds **multiple decision trees** (ensemble learning).
- Uses **bagging (bootstrap aggregation)**: each tree is trained on a **random subset** of the data.
- Uses **random feature selection** at each split to reduce tree correlation.
- **Averaging (regression) or majority voting (classification)** improves generalization.

### **Key Advantages of Random Forest:**
✅ Reduces **variance** (less overfitting).  
✅ More **robust and accurate** than a single decision tree.  
✅ Works well on large datasets with many features.

### **How It Aggregates Predictions:**
- **Regression:** Averages the predictions from all trees.
- **Classification:** Uses majority voting across trees.

---

## **Key Differences Between `rpart` and `randomForest`**
| Feature            | `rpart` (CART)      | `randomForest` |
|--------------------|--------------------|---------------|
| **Number of Trees** | 1 (Single Tree)    | Multiple Trees |
| **Data Used**      | Entire dataset     | Bootstrap sampling |
| **Feature Selection** | Uses all features at each split | Random subset at each split |
| **Overfitting Risk** | High if not pruned | Low due to averaging |

---

## **Best Practices**
- Use `rpart` when **interpretability** is important and pruning can control overfitting.
- Use `randomForest` when **accuracy and generalization** are more important than a single tree’s explainability.
- Use `plotcp()` and `prune()` to **optimize an `rpart` tree**.
- Use **`randomForest()`** for high-dimensional data with many features.

---

### **Example Code Snippets**

#### **Building a Decision Tree with `rpart`**
```r
library(rpart)
tree <- rpart(Kyphosis ~ ., method = "class", data = kyphosis)
printcp(tree)  # Show complexity parameter table
rpart.plot::prp(tree)  # Quick plot
