3 Exercise - Ensemble Methods and Hyperparameter Tuning. Using the Wine Dataset from scikit-learn

Implement Classification Models: • Train a Decision Tree Classifier and a Random Forest Classifier using scikit-learn. • Compare the models based on their F1 scores.

In [1]:
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import f1_score

wine = load_wine()
X = wine.data
y = wine.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Decision Tree
dt = DecisionTreeClassifier(random_state=42)
dt.fit(X_train, y_train)
dt_preds = dt.predict(X_test)
print("Decision Tree F1:", f1_score(y_test, dt_preds, average='weighted'))

# Random Forest
rf = RandomForestClassifier(random_state=42)
rf.fit(X_train, y_train)
rf_preds = rf.predict(X_test)
print("Random Forest F1:", f1_score(y_test, rf_preds, average='weighted'))

Decision Tree F1: 0.9439974457215836
Random Forest F1: 1.0


Hyperparameter Tuning: • Identify three hyperparameters of the Random Forest Classifier. • Perform hyperparameter tuning using GridSearchCV to optimize these parameters. • Take hints from the scikit-learn documentation to guide the implementati

In [2]:
from sklearn.model_selection import GridSearchCV

param_grid = {
    'n_estimators': [50, 100],
    'max_depth': [None, 5],
    'min_samples_split': [2, 5]
}

grid = GridSearchCV(RandomForestClassifier(random_state=42),
                    param_grid, scoring='f1_weighted', cv=3)
grid.fit(X_train, y_train)

print("Best Parameters:", grid.best_params_)
best_rf = grid.best_estimator_
print("F1 with Best RF:", f1_score(y_test, best_rf.predict(X_test), average='weighted'))

Best Parameters: {'max_depth': None, 'min_samples_split': 2, 'n_estimators': 100}
F1 with Best RF: 1.0


Implement Regression Model: • Train a Decision Tree Regressor and a Random Forest Regressor using scikit-learn. • Identify three parameters for Random Forest Regressio and Perform hyperparameter tuning using RandomSearchCV to optimize these parameters.

In [3]:
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error

X_reg = wine.data[:, 1:]
y_reg = wine.data[:, 0]

X_train_r, X_test_r, y_train_r, y_test_r = train_test_split(X_reg, y_reg, test_size=0.2, random_state=42)

# Decision Tree Regressor
dt_r = DecisionTreeRegressor(random_state=42)
dt_r.fit(X_train_r, y_train_r)
dt_preds = dt_r.predict(X_test_r)
print("Decision Tree MSE:", mean_squared_error(y_test_r, dt_preds))

# Random Forest Regressor
rf_r = RandomForestRegressor(random_state=42)
rf_r.fit(X_train_r, y_train_r)
rf_preds = rf_r.predict(X_test_r)
print("Random Forest MSE:", mean_squared_error(y_test_r, rf_preds))

Decision Tree MSE: 0.31197222222222226
Random Forest MSE: 0.15426672999999946


In [4]:
from sklearn.model_selection import RandomizedSearchCV

params = {
    'n_estimators': [50, 100],
    'max_depth': [None, 5],
    'min_samples_leaf': [1, 2]
}

rand_search = RandomizedSearchCV(RandomForestRegressor(random_state=42),
                                 param_distributions=params, n_iter=4, cv=3)
rand_search.fit(X_train_r, y_train_r)

best_rf_r = rand_search.best_estimator_
print("Best Parameters:", rand_search.best_params_)
print("MSE with Best RF:", mean_squared_error(y_test_r, best_rf_r.predict(X_test_r)))

Best Parameters: {'n_estimators': 50, 'min_samples_leaf': 1, 'max_depth': None}
MSE with Best RF: 0.15144571888888864
