**3 Exercise - Ensemble Methods and Hyperparameter Tuning.**

Using the Wine Dataset from scikit-learn

1. Implement Classification Models:

• Train a Decision Tree Classifier and a Random Forest Classifier using scikit-learn.

• Compare the models based on their F1 scores.

---



In [None]:
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import f1_score
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import mean_squared_error
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import RandomizedSearchCV

In [None]:
# Load dataset
wine = load_wine()
X = wine.data
y = wine.target


In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)


In [None]:
dt_model = DecisionTreeClassifier(random_state=42)
dt_model.fit(X_train, y_train)

# Predictions
dt_pred = dt_model.predict(X_test)

# F1 Score
dt_f1 = f1_score(y_test, dt_pred, average='weighted')


In [None]:
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)

# Predictions
rf_pred = rf_model.predict(X_test)

# F1 Score
rf_f1 = f1_score(y_test, rf_pred, average='weighted')


In [None]:
print("Decision Tree F1 Score:", dt_f1)
print("Random Forest F1 Score:", rf_f1)


Decision Tree F1 Score: 0.9439974457215836
Random Forest F1 Score: 1.0


**2. Hyperparameter Tuning:**

• Identify three hyperparameters of the Random Forest Classifier.

• Perform hyperparameter tuning using GridSearchCV to optimize these parameters.

• Take hints from the scikit-learn documentation to guide the implementation.

---



In [None]:
wine = load_wine()
X = wine.data
y = wine.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

In [None]:
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 5, 10],
    'min_samples_split': [2, 5, 10]
}

In [None]:
rf = RandomForestClassifier(random_state=42)

grid_search = GridSearchCV(
    estimator=rf,
    param_grid=param_grid,
    scoring='f1_weighted',
    cv=5
)

grid_search.fit(X_train, y_train)

In [None]:
print("Best Parameters:", grid_search.best_params_)

Best Parameters: {'max_depth': None, 'min_samples_split': 2, 'n_estimators': 100}


In [None]:
best_rf = grid_search.best_estimator_

y_pred = best_rf.predict(X_test)
best_f1 = f1_score(y_test, y_pred, average='weighted')

print("Optimized Random Forest F1 Score:", best_f1)

Optimized Random Forest F1 Score: 1.0


**3. Implement Regression Model:**

• Train a Decision Tree Regressor and a Random Forest Regressor using scikit-learn.

• Identify three parameters for Random Forest Regressio and Perform hyperparameter tuning using
RandomSearchCV to optimize these parameters.

---



In [None]:
# Load dataset
wine = load_wine()
X = wine.data
y = wine.data[:, 0]   # Alcohol (continuous value)

In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

In [None]:
dt_reg = DecisionTreeRegressor(random_state=42)
dt_reg.fit(X_train, y_train)

dt_pred = dt_reg.predict(X_test)
dt_mse = mean_squared_error(y_test, dt_pred)

print("Decision Tree Regressor MSE:", dt_mse)

Decision Tree Regressor MSE: 0.001705555555555562


In [None]:
rf_reg = RandomForestRegressor(random_state=42)
rf_reg.fit(X_train, y_train)

rf_pred = rf_reg.predict(X_test)
rf_mse = mean_squared_error(y_test, rf_pred)

print("Random Forest Regressor MSE:", rf_mse)

Random Forest Regressor MSE: 0.0011140116666664994


In [None]:
param_dist = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 5, 10, 15],
    'min_samples_split': [2, 5, 10]
}

In [None]:
rf = RandomForestRegressor(random_state=42)

random_search = RandomizedSearchCV(
    estimator=rf,
    param_distributions=param_dist,
    n_iter=10,
    scoring='neg_mean_squared_error',
    cv=5,
    random_state=42
)

random_search.fit(X_train, y_train)

In [None]:
print("Best Parameters:", random_search.best_params_)

Best Parameters: {'n_estimators': 50, 'min_samples_split': 5, 'max_depth': 15}


In [None]:
best_rf_reg = random_search.best_estimator_

best_pred = best_rf_reg.predict(X_test)
best_mse = mean_squared_error(y_test, best_pred)

print("Optimized Random Forest MSE:", best_mse)

Optimized Random Forest MSE: 0.0007135711295087991
