# Hyperparameter Tuning and Cross-Validation with Random Forest Regressor

In this notebook, we will perform **Hyperparameter Tuning** and **Cross-Validation (CV)** using a **RandomForestRegressor** to predict the `SalePrice` of homes from the Ames Housing dataset. We will optimize the model using grid search for hyperparameters and evaluate the results using cross-validation.

## 1. Introduction to Hyperparameter Tuning and Cross-Validation

### Hyperparameter Tuning:
- **Hyperparameters** are the parameters that are set before the model training process begins. These include parameters like the number of trees in a Random Forest or the depth of a decision tree.
- The goal of **hyperparameter tuning** is to find the best set of hyperparameters that improves the model's performance.

### Cross-Validation (CV):
- **Cross-validation** is a technique to evaluate machine learning models by training them on different subsets of the data and evaluating them on the remaining parts. This helps ensure that the model is not overfitting to a specific subset of data and can generalize well to unseen data.

### Search Space for Hyperparameters:
The **search space** defines the possible values for each hyperparameter. We will define the following hyperparameters for **RandomForestRegressor**:
- `n_estimators`: The number of trees in the forest.
- `max_depth`: The maximum depth of each tree.
- `min_samples_split`: The minimum number of samples required to split an internal node.
- `min_samples_leaf`: The minimum number of samples required at each leaf node.

We will use **GridSearchCV** to search over this space.

---

## 2. Define the Search Space for Hyperparameters

Before we perform **hyperparameter tuning**, we need to define the **search space**. This search space will consist of several values for each hyperparameter, and **GridSearchCV** will evaluate all combinations.

The hyperparameters we will tune include:
- `n_estimators`: Number of trees in the forest.
- `max_depth`: Maximum depth of the tree.
- `min_samples_split`: Minimum number of samples required to split an internal node.
- `min_samples_leaf`: Minimum number of samples at each leaf node.

```python
param_grid = {
    'model__n_estimators': [50, 100, 200],  # Number of trees in the forest
    'model__max_depth': [10, 20, None],  # Maximum depth of each tree
    'model__min_samples_split': [2, 5, 10],  # Minimum samples to split a node
    'model__min_samples_leaf': [1, 2,4]  # Minimum samples at a leaf node
}
s required at a leaf node
}


# 3. Possible Number of Results

The total number of results possible from the grid search depends on the number of combinations of hyperparameters in the search space. For each hyperparameter, we define a set of possible values.

For example, if we have the following grid:

- `n_estimators`: [50, 100, 200] (3 values)
- `max_depth`: [10, 20, None] (3 values)
- `min_samples_split`: [2, 5, 10] (3 values)
- `min_samples_leaf`: [1, 2, 4] (3 values)

The total number of possible combinations is:

\[
3 \times 3 \times 3 \times 3 = 81 \text{ combinations}
\]

Thus, **81 different results** will be evaluated by the grid search.

If you have more hyperparameters or more values per hyperparameter, the number of possible combinations increases exponentially.

You can calculate the total combinations as the product of the number of values for each hyperparameter.


# 4. Code: Hyperparameter Tuning with GridSearch

**Steps Included In the code**

Step 1: Importing the Libraries

*   **pandas:** For handling and processing data.
*   **sklearn:** A popular library for machine learning tasks (like training models, preprocessing data, etc.).

*   **tqdm:** Used for showing a progress bar (so you can see how far along a task is).
*   **numpy:** For numerical operations, especially with arrays.

Step 2: Loading and Exploring the Data:

*   **df = pd.read_csv('AmesHousing.csv') :** This line loads the dataset into a pandas DataFrame (a table-like data structure).
*  **print(df.head()) :** Displays the first few rows of the dataset so you can see what it looks like.


Step 3: Defining Features and Target Variable:


*   **features:** This is the part of the dataset that contains the input data (everything except the target column SalePrice).

*  **target:** This is the column we're trying to predict—SalePrice (the price of the house).

Step 4: Identifying Categorical and Numerical Features:


*   **categorical_features:** These are the columns that contain categorical data (like "Neighborhood" or "GarageFinish").

*   **numerical_features:** These are the columns with numeric data (like "LotArea" or "YearBuilt").

Step 5: Creating Preprocessing Pipelines :

This section defines how the data will be processed before being fed into the model.

*   **For numerical data:**


1.   **Imputer:** Fills in missing values with the mean of each column.

2.   **Scaler:** Scales the numerical features so they have a standard range (important for some algorithms).


*   **For categorical data:**


1.  **Imputer:** Fills in missing values with the string 'missing'.

2.  **OneHotEncoder:** Converts categorical values into a format that the model can understand by creating binary columns for each category.

Step 6 : Combining Preprocessing Steps :



*   **ColumnTransformer:** This applies the numerical and categorical transformations to the respective features in the dataset.

Step 7 : Splitting the Data into Training and Testing Sets:

*   **train_test_split:** This splits the data into training and test sets. 80% of the data will be used to train the model, and 20% will be used for testing.

Step 8 : Defining the Model:  
*   **RandomForestRegressor:** This is a machine learning model that can predict numerical values (like house prices). It works by combining multiple decision trees to make predictions.

Step 9 : Creating a Full Pipeline:



*  **Pipeline:** Combines both preprocessing and modeling steps into one pipeline, making it easier to handle and fit the model.

Step 10 : Setting Up Hyperparameter Tuning:   
*   **param_grid:** Specifies the hyperparameters to test during grid search. These settings control the behavior of the RandomForest model, such as the number of trees (**n_estimators**), tree depth (**max_depth**), and how the trees split and leaf nodes are formed.

Step 11 : Custom Grid Search with Progress Bar:



*   **custom_grid_search:** This function manually runs a grid search to try different hyperparameter combinations from the param_grid. It also uses the tqdm progress bar to track how far along the search is.
*   The function loops through all combinations of hyperparameters, fits the pipeline with each one, and records the R² score for each configuration.

Step 12 : Running the Grid Search :
*   **custom_grid_search(pipeline, param_grid, X_train, y_train):** This line calls the grid search function, which tests all possible combinations of hyperparameters defined in param_grid.

Step 13 : Finding the Best Model::

*   **max(grid_search_results, key=lambda x: x[1]):** Finds the hyperparameter combination that results in the highest R² score (best model performance).
*   The best hyperparameters and their corresponding R² score are printed.
*   **Mean Squared Error:** Measures the average squared difference between predicted and actual values (lower is better).

Step 14 : Final Model Evaluation on Test Set:

*   **pipeline.set_params(**best_params):** The pipeline is updated to use the best hyperparameters found during the grid search.
*   **y_pred = pipeline.predict(X_test):** The final model is used to make predictions on the test set.
*   **Evaluation:** The performance of the model is evaluated using the R² score (how well the model's predictions match the true values) and Mean Squared Error (a measure of the average error).

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder
from tqdm import tqdm
import numpy as np

# Load Ames Housing dataset from the local CSV file
df = pd.read_csv('AmesHousing.csv')

# Check the first few rows of the dataset
print(df.head())

# Define features and target variable
features = df.drop(['SalePrice'], axis=1)
target = df['SalePrice']

# Handle missing values and categorical variables
categorical_features = features.select_dtypes(include=['object']).columns
numerical_features = features.select_dtypes(exclude=['object']).columns

# Preprocessing pipeline for numerical and categorical features
numerical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='mean')),
    ('scaler', StandardScaler())
])

categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))
])

# Combine transformations for different feature types
preprocessor = ColumnTransformer(
    transformers=[
        ('num', numerical_transformer, numerical_features),
        ('cat', categorical_transformer, categorical_features)
    ])

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.2, random_state=42)

# Define the model
model = RandomForestRegressor(random_state=42)

# Create a pipeline that first preprocesses the data then applies the model
pipeline = Pipeline(steps=[('preprocessor', preprocessor),
                           ('model', model)])

# Define hyperparameters for tuning
param_grid = {
    'model__n_estimators': [50, 100, 200],
    'model__max_depth': [10, 20, None],
    'model__min_samples_split': [2, 5, 10],
    'model__min_samples_leaf': [1, 2, 4]
}

# Custom GridSearchCV with tqdm for progress bar
def custom_grid_search(pipeline, param_grid, X_train, y_train, cv=5):
    param_list = [dict(zip(param_grid, x)) for x in np.array(np.meshgrid(*param_grid.values())).T.reshape(-1, len(param_grid))]
    results = []

    for params in tqdm(param_list, desc="GridSearch Progress"):
        pipeline.set_params(**params)
        pipeline.fit(X_train, y_train)
        score = pipeline.score(X_train, y_train)  # R² score or any other metric
        results.append((params, score))

    return results

# Run the custom grid search with progress bar
grid_search_results = custom_grid_search(pipeline, param_grid, X_train, y_train)

# Find the best model
best_params, best_score = max(grid_search_results, key=lambda x: x[1])
print("Best Hyperparameters:", best_params)
print("Best R² Score:", best_score)

# Get the best model from the pipeline
pipeline.set_params(**best_params)

# Make predictions on the test set
y_pred = pipeline.predict(X_test)

# Print the evaluation metrics
print("R² Score:", r2_score(y_test, y_pred))
print("Mean Squared Error:", mean_squared_error(y_test, y_pred))

   Order        PID  MS SubClass MS Zoning  Lot Frontage  Lot Area Street  \
0      1  526301100           20        RL         141.0     31770   Pave   
1      2  526350040           20        RH          80.0     11622   Pave   
2      3  526351010           20        RL          81.0     14267   Pave   
3      4  526353030           20        RL          93.0     11160   Pave   
4      5  527105010           60        RL          74.0     13830   Pave   

  Alley Lot Shape Land Contour  ... Pool Area Pool QC  Fence Misc Feature  \
0   NaN       IR1          Lvl  ...         0     NaN    NaN          NaN   
1   NaN       Reg          Lvl  ...         0     NaN  MnPrv          NaN   
2   NaN       IR1          Lvl  ...         0     NaN    NaN         Gar2   
3   NaN       Reg          Lvl  ...         0     NaN    NaN          NaN   
4   NaN       IR1          Lvl  ...         0     NaN  MnPrv          NaN   

  Misc Val Mo Sold Yr Sold Sale Type  Sale Condition  SalePrice  
0       

GridSearch Progress: 100%|██████████| 81/81 [34:54<00:00, 25.85s/it]

Best Hyperparameters: {'model__n_estimators': 200, 'model__max_depth': 20, 'model__min_samples_split': 2, 'model__min_samples_leaf': 1}
Best R² Score: 0.9837748806499791
R² Score: 0.9029640257779548
Mean Squared Error: 777990574.3475772





# 5. Code: Hyperparameter Tuning with GridSearch
**(Streaming Results and Progress Bar)**

Each time a parameter combination is evaluated, we **print** the current hyperparameters and the corresponding R² score. This allows us to view the performance for each parameter set as the grid search progresses, without having to wait for all iterations to finish.

### `yield`

We use the `yield` keyword within the `custom_grid_search()` function to return each result one by one. This allows us to **stream** each result as soon as it is computed, rather than waiting for all iterations to complete. By using `yield`, we can pause the function and resume it at each iteration, giving us immediate feedback on the current search parameters and their performance.

### Progress Bar

A **progress bar** is displayed using the `tqdm` library. It updates as the grid search iterates, giving a visual indication of how far along the search process is. The progress bar provides a helpful way to track the grid search's progress and ensures that the user is aware of how long the process will take.

**Steps Included In the code**

Step 1: Importing the Libraries

*   **pandas:** For handling and processing data.
*   **sklearn:** A popular library for machine learning tasks (like training models, preprocessing data, etc.).

*   **tqdm:** Used for showing a progress bar (so you can see how far along a task is).
*   **numpy:** For numerical operations, especially with arrays.

Step 2: Loading and Exploring the Data:

*   **df = pd.read_csv('AmesHousing.csv') :** This line loads the dataset into a pandas DataFrame (a table-like data structure).
*  **print(df.head()) :** Displays the first few rows of the dataset so you can see what it looks like.


Step 3: Defining Features and Target Variable:


*   **features:** This is the part of the dataset that contains the input data (everything except the target column SalePrice).

*  **target:** This is the column we're trying to predict—SalePrice (the price of the house).

Step 4: Handling Missing Data and Categorical Variables:


*   **categorical_features:** These are the columns that contain categorical data (like "Neighborhood" or "GarageFinish").

*   **numerical_features:** These are the columns with numeric data (like "LotArea" or "YearBuilt").

Step 5: Creating Preprocessing Pipelines :

This section defines how the data will be processed before being fed into the model.

*   **For numerical data:**


1.   **Imputer:** Fills in missing values with the mean of each column.

2.   **Scaler:** Scales the numerical features so they have a standard range (important for some algorithms).


*   **For categorical data:**


1.  **Imputer:** Fills in missing values with the string 'missing'.

2.  **OneHotEncoder:** Converts categorical values into a format that the model can understand by creating binary columns for each category.

Step 6 : Combining Preprocessing Steps for Different Features:



*   **ColumnTransformer:** This step applies the specific preprocessing steps to different types of features (numerical or categorical) based on the column type.

Step 7 : Splitting the Data into Training and Testing Sets:

*   **train_test_split:** This splits the data into training data (used to train the model) and test data (used to evaluate the model). The test size is 20% of the data, and the random state ensures the split is reproducible.

Step 8 : Defining the Model:  
*   **RandomForestRegressor:** A machine learning model that works well for regression tasks like predicting house prices.

Step 9 : Creating a Full Pipeline:



*  **Pipeline:** Combines preprocessing and model training steps into a single object. This makes it easier to apply transformations and training in one go.

Step 10 : Setting Up Hyperparameter Tuning:   
*   **param_grid:** Specifies the hyperparameters (settings) to try when tuning the model. The values will be tested to find the best combination for predicting house prices.

Step 11 : Custom Grid Search with Progress Bar:



*   **custom_grid_search:** This function manually runs the grid search, which tests different combinations of hyperparameters from **param_grid** to find the best one. It also uses **tqdm** to show a progress bar during the search.


Step 12 : Running the Grid Search and Tracking the Best Model:
*   This part loops through all combinations of hyperparameters, trains the model with each one, evaluates it, and tracks the combination that gives the best performance.

Step 13 : Evaluating the Best Model:

*   After finding the best hyperparameters, the model is trained again with those settings, and its performance is evaluated on the **test set**.
*   **R² Score:** Tells you how well the model's predictions match the actual values (the closer to 1, the better).
*   **Mean Squared Error:** Measures the average squared difference between predicted and actual values (lower is better).

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder
from tqdm import tqdm
import numpy as np

# Load Ames Housing dataset from the local CSV file
df = pd.read_csv('AmesHousing.csv')

# Check the first few rows of the dataset
print(df.head())

# Define features and target variable
features = df.drop(['SalePrice'], axis=1)
target = df['SalePrice']

# Handle missing values and categorical variables
categorical_features = features.select_dtypes(include=['object']).columns
numerical_features = features.select_dtypes(exclude=['object']).columns

# Preprocessing pipeline for numerical and categorical features
numerical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='mean')),
    ('scaler', StandardScaler())
])

categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))
])

# Combine transformations for different feature types
preprocessor = ColumnTransformer(
    transformers=[
        ('num', numerical_transformer, numerical_features),
        ('cat', categorical_transformer, categorical_features)
    ])

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.2, random_state=42)

# Define the model
model = RandomForestRegressor(random_state=42)

# Create a pipeline that first preprocesses the data then applies the model
pipeline = Pipeline(steps=[('preprocessor', preprocessor),
                           ('model', model)])

# Define hyperparameters for tuning
param_grid = {
    'model__n_estimators': [50, 100, 200],
    'model__max_depth': [10, 20, None],
    'model__min_samples_split': [2, 5, 10],
    'model__min_samples_leaf': [1, 2, 4]
}

# Custom GridSearchCV with tqdm for progress bar and streaming results
def custom_grid_search(pipeline, param_grid, X_train, y_train, cv=5):
    param_list = [dict(zip(param_grid, x)) for x in np.array(np.meshgrid(*param_grid.values())).T.reshape(-1, len(param_grid))]

    for params in tqdm(param_list, desc="GridSearch Progress", position=0):
        pipeline.set_params(**params)
        pipeline.fit(X_train, y_train)

        # Evaluate model performance
        score = pipeline.score(X_train, y_train)  # R² score or other metric

        # Stream results after each iteration
        print(f"Evaluating params: {params}")
        print(f"R² Score for current parameters: {score:.4f}")

        # Optional: Record the results for further processing or selecting the best model
        yield params, score

# Run the custom grid search with progress bar and streamed results
best_score = -np.inf
best_params = None

for params, score in custom_grid_search(pipeline, param_grid, X_train, y_train):
    # Track the best model
    if score > best_score:
        best_score = score
        best_params = params

# Print the best parameters after all iterations
print("\nBest Hyperparameters:", best_params)
print("Best R² Score:", best_score)

# Set the best model in the pipeline and evaluate on the test set
pipeline.set_params(**best_params)
y_pred = pipeline.predict(X_test)

# Print the evaluation metrics on the test set
print("R² Score on test set:", r2_score(y_test, y_pred))
print("Mean Squared Error on test set:", mean_squared_error(y_test, y_pred))


   Order        PID  MS SubClass MS Zoning  Lot Frontage  Lot Area Street  \
0      1  526301100           20        RL         141.0     31770   Pave   
1      2  526350040           20        RH          80.0     11622   Pave   
2      3  526351010           20        RL          81.0     14267   Pave   
3      4  526353030           20        RL          93.0     11160   Pave   
4      5  527105010           60        RL          74.0     13830   Pave   

  Alley Lot Shape Land Contour  ... Pool Area Pool QC  Fence Misc Feature  \
0   NaN       IR1          Lvl  ...         0     NaN    NaN          NaN   
1   NaN       Reg          Lvl  ...         0     NaN  MnPrv          NaN   
2   NaN       IR1          Lvl  ...         0     NaN    NaN         Gar2   
3   NaN       Reg          Lvl  ...         0     NaN    NaN          NaN   
4   NaN       IR1          Lvl  ...         0     NaN  MnPrv          NaN   

  Misc Val Mo Sold Yr Sold Sale Type  Sale Condition  SalePrice  
0       

GridSearch Progress:   1%|▊                                                             | 1/81 [00:16<21:59, 16.49s/it]

Evaluating params: {'model__n_estimators': 50, 'model__max_depth': 10, 'model__min_samples_split': 2, 'model__min_samples_leaf': 1}
R² Score for current parameters: 0.9738


GridSearch Progress:   2%|█▌                                                            | 2/81 [00:47<32:42, 24.84s/it]

Evaluating params: {'model__n_estimators': 50, 'model__max_depth': 20, 'model__min_samples_split': 2, 'model__min_samples_leaf': 1}
R² Score for current parameters: 0.9823


GridSearch Progress:   4%|██▎                                                           | 3/81 [01:17<35:33, 27.35s/it]

Evaluating params: {'model__n_estimators': 50, 'model__max_depth': None, 'model__min_samples_split': 2, 'model__min_samples_leaf': 1}
R² Score for current parameters: 0.9821


GridSearch Progress:   5%|███                                                           | 4/81 [01:50<37:45, 29.42s/it]

Evaluating params: {'model__n_estimators': 100, 'model__max_depth': 10, 'model__min_samples_split': 2, 'model__min_samples_leaf': 1}
R² Score for current parameters: 0.9747


GridSearch Progress:   6%|███▊                                                          | 5/81 [02:49<51:05, 40.34s/it]

Evaluating params: {'model__n_estimators': 100, 'model__max_depth': 20, 'model__min_samples_split': 2, 'model__min_samples_leaf': 1}
R² Score for current parameters: 0.9832


GridSearch Progress:   7%|████▌                                                         | 6/81 [03:52<59:57, 47.97s/it]

Evaluating params: {'model__n_estimators': 100, 'model__max_depth': None, 'model__min_samples_split': 2, 'model__min_samples_leaf': 1}
R² Score for current parameters: 0.9833


GridSearch Progress:   9%|█████▏                                                      | 7/81 [04:57<1:06:04, 53.57s/it]

Evaluating params: {'model__n_estimators': 200, 'model__max_depth': 10, 'model__min_samples_split': 2, 'model__min_samples_leaf': 1}
R² Score for current parameters: 0.9751


GridSearch Progress:  10%|█████▉                                                      | 8/81 [06:59<1:31:38, 75.32s/it]

Evaluating params: {'model__n_estimators': 200, 'model__max_depth': 20, 'model__min_samples_split': 2, 'model__min_samples_leaf': 1}
R² Score for current parameters: 0.9836


GridSearch Progress:  11%|██████▋                                                     | 9/81 [09:01<1:47:48, 89.84s/it]

Evaluating params: {'model__n_estimators': 200, 'model__max_depth': None, 'model__min_samples_split': 2, 'model__min_samples_leaf': 1}
R² Score for current parameters: 0.9836


GridSearch Progress:  12%|███████▎                                                   | 10/81 [09:12<1:17:40, 65.64s/it]

Evaluating params: {'model__n_estimators': 50, 'model__max_depth': 10, 'model__min_samples_split': 5, 'model__min_samples_leaf': 1}
R² Score for current parameters: 0.9696


GridSearch Progress:  14%|████████▎                                                    | 11/81 [09:30<59:29, 51.00s/it]

Evaluating params: {'model__n_estimators': 50, 'model__max_depth': 20, 'model__min_samples_split': 5, 'model__min_samples_leaf': 1}
R² Score for current parameters: 0.9771


GridSearch Progress:  15%|█████████                                                    | 12/81 [09:48<47:09, 41.01s/it]

Evaluating params: {'model__n_estimators': 50, 'model__max_depth': None, 'model__min_samples_split': 5, 'model__min_samples_leaf': 1}
R² Score for current parameters: 0.9774


GridSearch Progress:  16%|█████████▊                                                   | 13/81 [10:13<40:47, 35.99s/it]

Evaluating params: {'model__n_estimators': 100, 'model__max_depth': 10, 'model__min_samples_split': 5, 'model__min_samples_leaf': 1}
R² Score for current parameters: 0.9704


GridSearch Progress:  17%|██████████▌                                                  | 14/81 [10:48<39:58, 35.80s/it]

Evaluating params: {'model__n_estimators': 100, 'model__max_depth': 20, 'model__min_samples_split': 5, 'model__min_samples_leaf': 1}
R² Score for current parameters: 0.9782


GridSearch Progress:  19%|███████████▎                                                 | 15/81 [11:28<40:45, 37.06s/it]

Evaluating params: {'model__n_estimators': 100, 'model__max_depth': None, 'model__min_samples_split': 5, 'model__min_samples_leaf': 1}
R² Score for current parameters: 0.9782


GridSearch Progress:  20%|████████████                                                 | 16/81 [12:19<44:45, 41.32s/it]

Evaluating params: {'model__n_estimators': 200, 'model__max_depth': 10, 'model__min_samples_split': 5, 'model__min_samples_leaf': 1}
R² Score for current parameters: 0.9711


GridSearch Progress:  21%|████████████▊                                                | 17/81 [13:43<57:33, 53.97s/it]

Evaluating params: {'model__n_estimators': 200, 'model__max_depth': 20, 'model__min_samples_split': 5, 'model__min_samples_leaf': 1}
R² Score for current parameters: 0.9789


GridSearch Progress:  22%|█████████████                                              | 18/81 [15:13<1:08:14, 64.99s/it]

Evaluating params: {'model__n_estimators': 200, 'model__max_depth': None, 'model__min_samples_split': 5, 'model__min_samples_leaf': 1}
R² Score for current parameters: 0.9790


GridSearch Progress:  23%|██████████████▎                                              | 19/81 [15:27<51:13, 49.57s/it]

Evaluating params: {'model__n_estimators': 50, 'model__max_depth': 10, 'model__min_samples_split': 10, 'model__min_samples_leaf': 1}
R² Score for current parameters: 0.9633


GridSearch Progress:  25%|███████████████                                              | 20/81 [15:47<41:18, 40.63s/it]

Evaluating params: {'model__n_estimators': 50, 'model__max_depth': 20, 'model__min_samples_split': 10, 'model__min_samples_leaf': 1}
R² Score for current parameters: 0.9681


GridSearch Progress:  26%|███████████████▊                                             | 21/81 [16:07<34:24, 34.41s/it]

Evaluating params: {'model__n_estimators': 50, 'model__max_depth': None, 'model__min_samples_split': 10, 'model__min_samples_leaf': 1}
R² Score for current parameters: 0.9684


GridSearch Progress:  27%|████████████████▌                                            | 22/81 [16:34<31:48, 32.34s/it]

Evaluating params: {'model__n_estimators': 100, 'model__max_depth': 10, 'model__min_samples_split': 10, 'model__min_samples_leaf': 1}
R² Score for current parameters: 0.9637


GridSearch Progress:  28%|█████████████████▎                                           | 23/81 [17:12<32:53, 34.03s/it]

Evaluating params: {'model__n_estimators': 100, 'model__max_depth': 20, 'model__min_samples_split': 10, 'model__min_samples_leaf': 1}
R² Score for current parameters: 0.9694


GridSearch Progress:  30%|██████████████████                                           | 24/81 [17:51<33:50, 35.63s/it]

Evaluating params: {'model__n_estimators': 100, 'model__max_depth': None, 'model__min_samples_split': 10, 'model__min_samples_leaf': 1}
R² Score for current parameters: 0.9691


GridSearch Progress:  31%|██████████████████▊                                          | 25/81 [18:47<38:41, 41.46s/it]

Evaluating params: {'model__n_estimators': 200, 'model__max_depth': 10, 'model__min_samples_split': 10, 'model__min_samples_leaf': 1}
R² Score for current parameters: 0.9642


GridSearch Progress:  32%|███████████████████▌                                         | 26/81 [20:02<47:15, 51.56s/it]

Evaluating params: {'model__n_estimators': 200, 'model__max_depth': 20, 'model__min_samples_split': 10, 'model__min_samples_leaf': 1}
R² Score for current parameters: 0.9701


GridSearch Progress:  33%|████████████████████▎                                        | 27/81 [21:16<52:30, 58.34s/it]

Evaluating params: {'model__n_estimators': 200, 'model__max_depth': None, 'model__min_samples_split': 10, 'model__min_samples_leaf': 1}
R² Score for current parameters: 0.9700


GridSearch Progress:  35%|█████████████████████                                        | 28/81 [21:31<39:59, 45.27s/it]

Evaluating params: {'model__n_estimators': 50, 'model__max_depth': 10, 'model__min_samples_split': 2, 'model__min_samples_leaf': 2}
R² Score for current parameters: 0.9674


GridSearch Progress:  36%|█████████████████████▊                                       | 29/81 [21:52<33:06, 38.20s/it]

Evaluating params: {'model__n_estimators': 50, 'model__max_depth': 20, 'model__min_samples_split': 2, 'model__min_samples_leaf': 2}
R² Score for current parameters: 0.9746


GridSearch Progress:  37%|██████████████████████▌                                      | 30/81 [22:15<28:27, 33.48s/it]

Evaluating params: {'model__n_estimators': 50, 'model__max_depth': None, 'model__min_samples_split': 2, 'model__min_samples_leaf': 2}
R² Score for current parameters: 0.9747


GridSearch Progress:  38%|███████████████████████▎                                     | 31/81 [22:41<26:06, 31.34s/it]

Evaluating params: {'model__n_estimators': 100, 'model__max_depth': 10, 'model__min_samples_split': 2, 'model__min_samples_leaf': 2}
R² Score for current parameters: 0.9687


GridSearch Progress:  40%|████████████████████████                                     | 32/81 [23:23<28:04, 34.38s/it]

Evaluating params: {'model__n_estimators': 100, 'model__max_depth': 20, 'model__min_samples_split': 2, 'model__min_samples_leaf': 2}
R² Score for current parameters: 0.9756


GridSearch Progress:  41%|████████████████████████▊                                    | 33/81 [24:05<29:31, 36.91s/it]

Evaluating params: {'model__n_estimators': 100, 'model__max_depth': None, 'model__min_samples_split': 2, 'model__min_samples_leaf': 2}
R² Score for current parameters: 0.9755


GridSearch Progress:  42%|█████████████████████████▌                                   | 34/81 [25:01<33:22, 42.61s/it]

Evaluating params: {'model__n_estimators': 200, 'model__max_depth': 10, 'model__min_samples_split': 2, 'model__min_samples_leaf': 2}
R² Score for current parameters: 0.9694


GridSearch Progress:  43%|██████████████████████████▎                                  | 35/81 [26:22<41:29, 54.12s/it]

Evaluating params: {'model__n_estimators': 200, 'model__max_depth': 20, 'model__min_samples_split': 2, 'model__min_samples_leaf': 2}
R² Score for current parameters: 0.9764


GridSearch Progress:  44%|███████████████████████████                                  | 36/81 [27:47<47:33, 63.41s/it]

Evaluating params: {'model__n_estimators': 200, 'model__max_depth': None, 'model__min_samples_split': 2, 'model__min_samples_leaf': 2}
R² Score for current parameters: 0.9766


GridSearch Progress:  46%|███████████████████████████▊                                 | 37/81 [28:02<35:51, 48.91s/it]

Evaluating params: {'model__n_estimators': 50, 'model__max_depth': 10, 'model__min_samples_split': 5, 'model__min_samples_leaf': 2}
R² Score for current parameters: 0.9667


GridSearch Progress:  47%|████████████████████████████▌                                | 38/81 [28:23<28:57, 40.41s/it]

Evaluating params: {'model__n_estimators': 50, 'model__max_depth': 20, 'model__min_samples_split': 5, 'model__min_samples_leaf': 2}
R² Score for current parameters: 0.9730


GridSearch Progress:  48%|█████████████████████████████▎                               | 39/81 [28:43<24:00, 34.29s/it]

Evaluating params: {'model__n_estimators': 50, 'model__max_depth': None, 'model__min_samples_split': 5, 'model__min_samples_leaf': 2}
R² Score for current parameters: 0.9729


GridSearch Progress:  49%|██████████████████████████████                               | 40/81 [29:10<22:00, 32.20s/it]

Evaluating params: {'model__n_estimators': 100, 'model__max_depth': 10, 'model__min_samples_split': 5, 'model__min_samples_leaf': 2}
R² Score for current parameters: 0.9673


GridSearch Progress:  51%|██████████████████████████████▉                              | 41/81 [29:49<22:46, 34.16s/it]

Evaluating params: {'model__n_estimators': 100, 'model__max_depth': 20, 'model__min_samples_split': 5, 'model__min_samples_leaf': 2}
R² Score for current parameters: 0.9739


GridSearch Progress:  52%|███████████████████████████████▋                             | 42/81 [30:30<23:30, 36.17s/it]

Evaluating params: {'model__n_estimators': 100, 'model__max_depth': None, 'model__min_samples_split': 5, 'model__min_samples_leaf': 2}
R² Score for current parameters: 0.9739


GridSearch Progress:  53%|████████████████████████████████▍                            | 43/81 [31:26<26:46, 42.28s/it]

Evaluating params: {'model__n_estimators': 200, 'model__max_depth': 10, 'model__min_samples_split': 5, 'model__min_samples_leaf': 2}
R² Score for current parameters: 0.9680


GridSearch Progress:  54%|█████████████████████████████████▏                           | 44/81 [32:47<33:12, 53.86s/it]

Evaluating params: {'model__n_estimators': 200, 'model__max_depth': 20, 'model__min_samples_split': 5, 'model__min_samples_leaf': 2}
R² Score for current parameters: 0.9747


GridSearch Progress:  56%|█████████████████████████████████▉                           | 45/81 [34:10<37:28, 62.45s/it]

Evaluating params: {'model__n_estimators': 200, 'model__max_depth': None, 'model__min_samples_split': 5, 'model__min_samples_leaf': 2}
R² Score for current parameters: 0.9747


GridSearch Progress:  57%|██████████████████████████████████▋                          | 46/81 [34:22<27:41, 47.46s/it]

Evaluating params: {'model__n_estimators': 50, 'model__max_depth': 10, 'model__min_samples_split': 10, 'model__min_samples_leaf': 2}
R² Score for current parameters: 0.9592


GridSearch Progress:  58%|███████████████████████████████████▍                         | 47/81 [34:40<21:46, 38.42s/it]

Evaluating params: {'model__n_estimators': 50, 'model__max_depth': 20, 'model__min_samples_split': 10, 'model__min_samples_leaf': 2}
R² Score for current parameters: 0.9645


GridSearch Progress:  59%|████████████████████████████████████▏                        | 48/81 [34:56<17:27, 31.74s/it]

Evaluating params: {'model__n_estimators': 50, 'model__max_depth': None, 'model__min_samples_split': 10, 'model__min_samples_leaf': 2}
R² Score for current parameters: 0.9645


GridSearch Progress:  60%|████████████████████████████████████▉                        | 49/81 [35:20<15:39, 29.37s/it]

Evaluating params: {'model__n_estimators': 100, 'model__max_depth': 10, 'model__min_samples_split': 10, 'model__min_samples_leaf': 2}
R² Score for current parameters: 0.9602


GridSearch Progress:  62%|█████████████████████████████████████▋                       | 50/81 [35:52<15:41, 30.36s/it]

Evaluating params: {'model__n_estimators': 100, 'model__max_depth': 20, 'model__min_samples_split': 10, 'model__min_samples_leaf': 2}
R² Score for current parameters: 0.9654


GridSearch Progress:  63%|██████████████████████████████████████▍                      | 51/81 [36:24<15:26, 30.87s/it]

Evaluating params: {'model__n_estimators': 100, 'model__max_depth': None, 'model__min_samples_split': 10, 'model__min_samples_leaf': 2}
R² Score for current parameters: 0.9653


GridSearch Progress:  64%|███████████████████████████████████████▏                     | 52/81 [37:11<17:12, 35.61s/it]

Evaluating params: {'model__n_estimators': 200, 'model__max_depth': 10, 'model__min_samples_split': 10, 'model__min_samples_leaf': 2}
R² Score for current parameters: 0.9609


GridSearch Progress:  65%|███████████████████████████████████████▉                     | 53/81 [38:08<19:34, 41.93s/it]

Evaluating params: {'model__n_estimators': 200, 'model__max_depth': 20, 'model__min_samples_split': 10, 'model__min_samples_leaf': 2}
R² Score for current parameters: 0.9661


GridSearch Progress:  67%|████████████████████████████████████████▋                    | 54/81 [39:05<20:55, 46.51s/it]

Evaluating params: {'model__n_estimators': 200, 'model__max_depth': None, 'model__min_samples_split': 10, 'model__min_samples_leaf': 2}
R² Score for current parameters: 0.9660


GridSearch Progress:  68%|█████████████████████████████████████████▍                   | 55/81 [39:16<15:29, 35.77s/it]

Evaluating params: {'model__n_estimators': 50, 'model__max_depth': 10, 'model__min_samples_split': 2, 'model__min_samples_leaf': 4}
R² Score for current parameters: 0.9534


GridSearch Progress:  69%|██████████████████████████████████████████▏                  | 56/81 [39:29<12:08, 29.14s/it]

Evaluating params: {'model__n_estimators': 50, 'model__max_depth': 20, 'model__min_samples_split': 2, 'model__min_samples_leaf': 4}
R² Score for current parameters: 0.9573


GridSearch Progress:  70%|██████████████████████████████████████████▉                  | 57/81 [39:43<09:49, 24.55s/it]

Evaluating params: {'model__n_estimators': 50, 'model__max_depth': None, 'model__min_samples_split': 2, 'model__min_samples_leaf': 4}
R² Score for current parameters: 0.9573


GridSearch Progress:  72%|███████████████████████████████████████████▋                 | 58/81 [40:05<09:04, 23.68s/it]

Evaluating params: {'model__n_estimators': 100, 'model__max_depth': 10, 'model__min_samples_split': 2, 'model__min_samples_leaf': 4}
R² Score for current parameters: 0.9546


GridSearch Progress:  73%|████████████████████████████████████████████▍                | 59/81 [40:32<09:01, 24.63s/it]

Evaluating params: {'model__n_estimators': 100, 'model__max_depth': 20, 'model__min_samples_split': 2, 'model__min_samples_leaf': 4}
R² Score for current parameters: 0.9586


GridSearch Progress:  74%|█████████████████████████████████████████████▏               | 60/81 [41:00<09:03, 25.86s/it]

Evaluating params: {'model__n_estimators': 100, 'model__max_depth': None, 'model__min_samples_split': 2, 'model__min_samples_leaf': 4}
R² Score for current parameters: 0.9586


GridSearch Progress:  75%|█████████████████████████████████████████████▉               | 61/81 [41:46<10:35, 31.77s/it]

Evaluating params: {'model__n_estimators': 200, 'model__max_depth': 10, 'model__min_samples_split': 2, 'model__min_samples_leaf': 4}
R² Score for current parameters: 0.9551


GridSearch Progress:  77%|██████████████████████████████████████████████▋              | 62/81 [42:52<13:16, 41.92s/it]

Evaluating params: {'model__n_estimators': 200, 'model__max_depth': 20, 'model__min_samples_split': 2, 'model__min_samples_leaf': 4}
R² Score for current parameters: 0.9589


GridSearch Progress:  78%|███████████████████████████████████████████████▍             | 63/81 [43:56<14:34, 48.57s/it]

Evaluating params: {'model__n_estimators': 200, 'model__max_depth': None, 'model__min_samples_split': 2, 'model__min_samples_leaf': 4}
R² Score for current parameters: 0.9589


GridSearch Progress:  79%|████████████████████████████████████████████████▏            | 64/81 [44:09<10:44, 37.90s/it]

Evaluating params: {'model__n_estimators': 50, 'model__max_depth': 10, 'model__min_samples_split': 5, 'model__min_samples_leaf': 4}
R² Score for current parameters: 0.9534


GridSearch Progress:  80%|████████████████████████████████████████████████▉            | 65/81 [44:25<08:22, 31.41s/it]

Evaluating params: {'model__n_estimators': 50, 'model__max_depth': 20, 'model__min_samples_split': 5, 'model__min_samples_leaf': 4}
R² Score for current parameters: 0.9573


GridSearch Progress:  81%|█████████████████████████████████████████████████▋           | 66/81 [44:40<06:39, 26.60s/it]

Evaluating params: {'model__n_estimators': 50, 'model__max_depth': None, 'model__min_samples_split': 5, 'model__min_samples_leaf': 4}
R² Score for current parameters: 0.9573


GridSearch Progress:  83%|██████████████████████████████████████████████████▍          | 67/81 [45:02<05:51, 25.07s/it]

Evaluating params: {'model__n_estimators': 100, 'model__max_depth': 10, 'model__min_samples_split': 5, 'model__min_samples_leaf': 4}
R² Score for current parameters: 0.9546


GridSearch Progress:  84%|███████████████████████████████████████████████████▏         | 68/81 [45:30<05:39, 26.15s/it]

Evaluating params: {'model__n_estimators': 100, 'model__max_depth': 20, 'model__min_samples_split': 5, 'model__min_samples_leaf': 4}
R² Score for current parameters: 0.9586


GridSearch Progress:  85%|███████████████████████████████████████████████████▉         | 69/81 [45:59<05:23, 26.94s/it]

Evaluating params: {'model__n_estimators': 100, 'model__max_depth': None, 'model__min_samples_split': 5, 'model__min_samples_leaf': 4}
R² Score for current parameters: 0.9586


GridSearch Progress:  86%|████████████████████████████████████████████████████▋        | 70/81 [46:47<06:06, 33.31s/it]

Evaluating params: {'model__n_estimators': 200, 'model__max_depth': 10, 'model__min_samples_split': 5, 'model__min_samples_leaf': 4}
R² Score for current parameters: 0.9551


GridSearch Progress:  88%|█████████████████████████████████████████████████████▍       | 71/81 [47:48<06:54, 41.48s/it]

Evaluating params: {'model__n_estimators': 200, 'model__max_depth': 20, 'model__min_samples_split': 5, 'model__min_samples_leaf': 4}
R² Score for current parameters: 0.9589


GridSearch Progress:  89%|██████████████████████████████████████████████████████▏      | 72/81 [48:55<07:23, 49.22s/it]

Evaluating params: {'model__n_estimators': 200, 'model__max_depth': None, 'model__min_samples_split': 5, 'model__min_samples_leaf': 4}
R² Score for current parameters: 0.9589


GridSearch Progress:  90%|██████████████████████████████████████████████████████▉      | 73/81 [49:09<05:08, 38.56s/it]

Evaluating params: {'model__n_estimators': 50, 'model__max_depth': 10, 'model__min_samples_split': 10, 'model__min_samples_leaf': 4}
R² Score for current parameters: 0.9506


GridSearch Progress:  91%|███████████████████████████████████████████████████████▋     | 74/81 [49:25<03:43, 31.90s/it]

Evaluating params: {'model__n_estimators': 50, 'model__max_depth': 20, 'model__min_samples_split': 10, 'model__min_samples_leaf': 4}
R² Score for current parameters: 0.9543


GridSearch Progress:  93%|████████████████████████████████████████████████████████▍    | 75/81 [49:41<02:41, 26.98s/it]

Evaluating params: {'model__n_estimators': 50, 'model__max_depth': None, 'model__min_samples_split': 10, 'model__min_samples_leaf': 4}
R² Score for current parameters: 0.9543


GridSearch Progress:  94%|█████████████████████████████████████████████████████████▏   | 76/81 [50:04<02:08, 25.73s/it]

Evaluating params: {'model__n_estimators': 100, 'model__max_depth': 10, 'model__min_samples_split': 10, 'model__min_samples_leaf': 4}
R² Score for current parameters: 0.9517


GridSearch Progress:  95%|█████████████████████████████████████████████████████████▉   | 77/81 [50:29<01:42, 25.69s/it]

Evaluating params: {'model__n_estimators': 100, 'model__max_depth': 20, 'model__min_samples_split': 10, 'model__min_samples_leaf': 4}
R² Score for current parameters: 0.9554


GridSearch Progress:  96%|██████████████████████████████████████████████████████████▋  | 78/81 [50:54<01:15, 25.30s/it]

Evaluating params: {'model__n_estimators': 100, 'model__max_depth': None, 'model__min_samples_split': 10, 'model__min_samples_leaf': 4}
R² Score for current parameters: 0.9554


GridSearch Progress:  98%|███████████████████████████████████████████████████████████▍ | 79/81 [51:33<00:59, 29.58s/it]

Evaluating params: {'model__n_estimators': 200, 'model__max_depth': 10, 'model__min_samples_split': 10, 'model__min_samples_leaf': 4}
R² Score for current parameters: 0.9521


GridSearch Progress:  99%|████████████████████████████████████████████████████████████▏| 80/81 [52:22<00:35, 35.38s/it]

Evaluating params: {'model__n_estimators': 200, 'model__max_depth': 20, 'model__min_samples_split': 10, 'model__min_samples_leaf': 4}
R² Score for current parameters: 0.9558


GridSearch Progress: 100%|█████████████████████████████████████████████████████████████| 81/81 [53:11<00:00, 39.40s/it]

Evaluating params: {'model__n_estimators': 200, 'model__max_depth': None, 'model__min_samples_split': 10, 'model__min_samples_leaf': 4}
R² Score for current parameters: 0.9558

Best Hyperparameters: {'model__n_estimators': 200, 'model__max_depth': 20, 'model__min_samples_split': 2, 'model__min_samples_leaf': 1}
Best R² Score: 0.9835951627501026
R² Score on test set: 0.9030877014134127
Mean Squared Error on test set: 776998999.0124081





**Conclusion :**

In this notebook, we performed hyperparameter tuning and cross-validation using a RandomForestRegressor to predict house prices from the Ames Housing dataset. By defining a search space for key hyperparameters and using GridSearchCV, we optimized the model to improve performance. Cross-validation ensured the model's ability to generalize to new data, preventing overfitting. The tuning process allowed us to identify the best hyperparameter combination for the model. Ultimately, this approach improved model accuracy and ensured reliable predictions, making the model robust and capable of performing well on unseen data, which is essential for real-world applications.

**NOTE:**


*   **custom grid search with progress bar**
    
   code is static i.e. it takes time to output the results of all models generated
*  **custom grid search with progress bar and streamed results**

 code is with live streaming wherein model gets displayed one by one as it is generated

