# 🌳 Cost Function in Random Forest

Random Forests don’t have a single global cost function like linear regression. Instead:

### 1. **At Tree Level (Split Criterion)**

* **Classification:**
  Splits are chosen to minimize **impurity**.

  * **Gini Impurity:**

    $$
    Gini(S) = 1 - \sum_{k=1}^K p_k^2
    $$
  * **Entropy (Information Gain):**

    $$
    H(S) = -\sum_{k=1}^K p_k \log(p_k)
    $$

* **Regression:**
  Splits minimize **variance (MSE)** of target values:

  $$
  MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y})^2
  $$

---

### 2. **At Forest Level**

* Final prediction = **aggregate of trees**:

  * Regression → average of predictions.
  * Classification → majority vote (or average of predicted probabilities).

* Cost function depends on task:

  * **Regression Random Forest:**

    $$
    J_{RF} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i^{RF})^2
    $$
  * **Classification Random Forest (log loss):**

    $$
    J_{RF} = - \frac{1}{n} \sum_{i=1}^{n} \sum_{k=1}^K y_{i,k} \log(\hat{p}_{i,k}^{RF})
    $$

✅ Takeaway:

* Trees minimize **impurity** locally.
* The forest is evaluated by global error metrics like **MSE, accuracy, log-loss**.

---

# ⚙️ Hyperparameter Tuning in Random Forest

Random Forests have several **hyperparameters** that control bias, variance, and speed:

---

## 1. **Tree-related parameters**

* `max_depth`: Maximum depth of each tree.

  * Shallow → high bias, low variance.
  * Deep → low bias, high variance.
* `min_samples_split`: Minimum samples required to split a node.
* `min_samples_leaf`: Minimum samples required at a leaf.
* `max_features`: Number of features to consider when splitting.

  * Common: `"sqrt"` (classification), `"log2"`, or a fraction.
  * Controls correlation between trees (lower = more diverse trees).

---

## 2. **Forest-related parameters**

* `n_estimators`: Number of trees in the forest.

  * More trees → lower variance, higher computation.
* `bootstrap`: Whether to use bootstrap sampling (default=True).
* `max_samples`: Fraction of data sampled for each tree (if bootstrap=True).

---

## 3. **Regularization parameters**

* `max_leaf_nodes`: Maximum number of leaf nodes.
* `ccp_alpha`: Complexity pruning parameter (post-pruning).

---

## 4. **How to Tune**

* Use **Grid Search** or **Random Search** with **Cross-Validation**.
* Common strategy:

  1. Start with `n_estimators` high enough (e.g., 200+).
  2. Tune `max_depth`, `min_samples_split`, `min_samples_leaf`.
  3. Adjust `max_features` to control correlation between trees.
  4. Use **Out-of-Bag (OOB) error** to evaluate instead of cross-validation (faster).

---

# 🎯 Example (Scikit-Learn)

```python
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import RandomizedSearchCV

# Model
rf = RandomForestClassifier(oob_score=True, random_state=42)

# Hyperparameter grid
param_grid = {
    'n_estimators': [100, 200, 500],
    'max_depth': [None, 5, 10, 20],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4],
    'max_features': ['sqrt', 'log2', 0.5]
}

# Randomized search
search = RandomizedSearchCV(rf, param_distributions=param_grid, 
                            n_iter=20, cv=5, scoring='accuracy', n_jobs=-1)

search.fit(X_train, y_train)

print("Best Params:", search.best_params_)
print("OOB Score:", search.best_estimator_.oob_score_)
```

---

✅ **Summary:**

* **Cost function** in Random Forest = impurity measures (local) + global evaluation (MSE/log-loss).
* **Hyperparameters** control tree depth, number of trees, and randomness.
* **Tuning** = balance bias vs. variance for best generalization.

