## 1. What is Ensemble Learning in machine learning?
Ensemble Learning is a technique where multiple models are combined to improve predictive performance. The main idea is that combining several weak or diverse learners results in a more accurate and robust model.

## 2. Difference between Bagging and Boosting
Bagging reduces variance by training models independently on different bootstrap samples, whereas Boosting reduces bias by training models sequentially and focusing on misclassified instances.

## 3. Bootstrap sampling and its role in Bagging
Bootstrap sampling creates multiple datasets by sampling with replacement. In Bagging, it provides different training sets to each model, increasing diversity and reducing overfitting.

## 4. Out-of-Bag (OOB) samples and OOB score
OOB samples are data points not included in a model's bootstrap sample. OOB score estimates model performance using these samples without needing a separate validation set.

## 5. Feature importance: Decision Tree vs Random Forest
A single Decision Tree may give unstable feature importance due to overfitting, while Random Forest averages importance across trees, resulting in more reliable estimates.

# 6. Random Forest Classifier (Breast Cancer Dataset)

In [1]:
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
import pandas as pd

data = load_breast_cancer()
X, y = data.data, data.target

model = RandomForestClassifier(random_state=42)
model.fit(X, y)

importances = pd.Series(model.feature_importances_, index=data.feature_names)
print(importances.sort_values(ascending=False).head(5))


worst area              0.139357
worst concave points    0.132225
mean concave points     0.107046
worst radius            0.082848
worst perimeter         0.080850
dtype: float64


# 7. Bagging Classifier using Decision Tree (Iris Dataset)

In [2]:
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score


X, y = load_iris(return_X_y=True)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)


base_tree = DecisionTreeClassifier(random_state=42)

bagging_model = BaggingClassifier(
    estimator=base_tree,
    n_estimators=50,
    random_state=42
)

bagging_model.fit(X_train, y_train)

y_pred = bagging_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print("Bagging Classifier Accuracy:", accuracy)



Bagging Classifier Accuracy: 1.0


# 8. Random Forest Classifier Hyperparameter Tuning using GridSearchCV

In [3]:
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.metrics import accuracy_score

X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

param_grid = {'n_estimators': [50, 100], 'max_depth': [None, 5, 10]}
rf = RandomForestClassifier(random_state=42)

grid = GridSearchCV(rf, param_grid, cv=3)
grid.fit(X_train, y_train)

best = grid.best_estimator_
print("Best Parameters:", grid.best_params_)
print("Accuracy:", accuracy_score(y_test, best.predict(X_test)))


Best Parameters: {'max_depth': None, 'n_estimators': 50}
Accuracy: 0.9707602339181286


## 9. Bagging vs Random Forest Regressor (California Housing Dataset)

In [7]:
from sklearn.datasets import fetch_california_housing
from sklearn.ensemble import BaggingRegressor, RandomForestRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

X, y = fetch_california_housing(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

bag = BaggingRegressor(
    base_estimator=DecisionTreeRegressor(),
    n_estimators=50,
    random_state=42
)

rf = RandomForestRegressor(n_estimators=100, random_state=42)

bag.fit(X_train, y_train)
rf.fit(X_train, y_train)

print("Bagging Regressor MSE:", mean_squared_error(y_test, bag.predict(X_test)))
print("Random Forest Regressor MSE:", mean_squared_error(y_test, rf.predict(X_test)))




Bagging Regressor MSE: 0.2579153056796594
Random Forest Regressor MSE: 0.25638991335459355


In [8]:
from sklearn.datasets import fetch_california_housing
from sklearn.ensemble import BaggingRegressor, RandomForestRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

data = fetch_california_housing(download_if_missing=True)
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

bagging_model = BaggingRegressor(
    estimator=DecisionTreeRegressor(random_state=42),
    n_estimators=50,
    random_state=42
)

rf_model = RandomForestRegressor(
    n_estimators=100,
    random_state=42
)

bagging_model.fit(X_train, y_train)
rf_model.fit(X_train, y_train)

bag_pred = bagging_model.predict(X_test)
rf_pred = rf_model.predict(X_test)

bag_mse = mean_squared_error(y_test, bag_pred)
rf_mse = mean_squared_error(y_test, rf_pred)

print("Bagging Regressor MSE:", bag_mse)
print("Random Forest Regressor MSE:", rf_mse)


Bagging Regressor MSE: 0.2579153056796594
Random Forest Regressor MSE: 0.25638991335459355
