### Random Forests

#### Introduction
Random Forest is an ensemble learning method used for classification and regression tasks. It operates by constructing multiple decision trees during training and outputs the class (classification) or mean prediction (regression) from all the trees.

#### How It Works
1. **Bootstrapping (Bagging)**: 
   - The algorithm randomly selects subsets of the training data with replacement (bootstrap sampling).
   - Multiple decision trees are trained on different subsets of the data.

2. **Feature Randomness**:
   - At each split in a tree, a random subset of features is considered instead of all features.
   - This helps in reducing correlation among trees, making the model more robust.

3. **Majority Voting / Averaging**:
   - For classification: Each tree makes a prediction, and the most common class is chosen (majority voting).
   - For regression: The predictions from all trees are averaged to get the final result.

#### Advantages
- Reduces overfitting compared to a single decision tree.
- Works well with large datasets and high-dimensional data.
- Handles missing values effectively.
- Can handle both classification and regression tasks.

#### Disadvantages
- Requires more computational resources compared to a single decision tree.
- Can be less interpretable than a single decision tree.

#### Hyperparameters
Some key hyperparameters to tune in Random Forests:
- `n_estimators`: Number of trees in the forest.
- `max_depth`: Maximum depth of each tree.
- `min_samples_split`: Minimum samples required to split a node.
- `min_samples_leaf`: Minimum samples required at a leaf node.
- `max_features`: Number of features to consider for each split.




In [2]:
import matplotlib.pyplot as plt
import numpy as np
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split