# Hands-On Activity: Wine Quality Prediction using Random Forest Regression
--- 
### Introduction:

In this hands-on activity, you will work with the Wine Quality dataset to build a Random Forest Regression model to predict the quality of wine based on various features. The quality score ranges from 0 to 10, with higher scores indicating better wine quality.

### Task Overview:

1. **Data Loading and Exploration:**
   - Import the necessary libraries (already provided in the code).
   - Load the Wine Quality dataset using `load_wine()` and create a Pandas DataFrame.
   - Inspect the first few rows of the DataFrame to understand the data.

2. **Dataset Information:**
   - Check the information about the dataset using `df.info()`.

3. **Data Splitting and Standardization:**
   - Separate features (X) and the target variable (y).
   - Split the data into training and testing sets (use `train_test_split` with a test size of 20% and `random_state` of 42).
   - Standardize the features using `StandardScaler`.

4. **Random Forest Regression:**
   - Define a Random Forest Regressor model with `random_state` set to 42.
   - Perform hyperparameter tuning using `GridSearchCV` with the following parameter grid:
     - `n_estimators`: [50, 100, 200]
     - `max_depth`: [None, 10, 20]

5. **Model Evaluation:**
   - Print the best hyperparameters obtained from the grid search.
   - Use the best model to make predictions on the test set.
   - Evaluate the model performance using the R-squared (`r2_score`) on the test set.
   - Display the results.

### Additional Guidelines:
- Follow the provided code structure and fill in the necessary code to complete each task.

Enjoy the hands-on experience, and good luck with your wine quality prediction!

In [2]:
# Import necessary libraries
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split, GridSearchCV


In [9]:
# Load the Wine Quality dataset
wine_data = load_wine()

# Create a Pandas DataFrame from the dataset
df = pd.DataFrame(wine_data.data, columns=wine_data.feature_names)

# Add the target values to the DataFrame as a new column
df["target"] = wine_data.target

# Inspect the first few rows of the DataFrame
df.head()


In [4]:
# Check the information about the dataset
df.info()

# Separate features (X) and target variable (y)
X = df.drop("target", axis=1)
y = df["target"]

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 178 entries, 0 to 177
Data columns (total 14 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   alcohol                       178 non-null    float64
 1   malic_acid                    178 non-null    float64
 2   ash                           178 non-null    float64
 3   alcalinity_of_ash             178 non-null    float64
 4   magnesium                     178 non-null    float64
 5   total_phenols                 178 non-null    float64
 6   flavanoids                    178 non-null    float64
 7   nonflavanoid_phenols          178 non-null    float64
 8   proanthocyanins               178 non-null    float64
 9   color_intensity               178 non-null    float64
 10  hue                           178 non-null    float64
 11  od280/od315_of_diluted_wines  178 non-null    float64
 12  proline                       178 non-null    float64
 13  targe

In [5]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [6]:
# Standardize the features using StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [7]:
# Define the Random Forest Regressor model with random_state
rf_model = RandomForestRegressor(random_state=42)

# Hyperparameter tuning with GridSearchCV
param_grid = {
    "n_estimators": [100, 200],
    "max_depth": [None, 20]
}

grid_search = GridSearchCV(rf_model, 
                           param_grid, 
                           scoring="r2", 
                           cv=5, n_jobs=-1)
grid_search.fit(X_train_scaled, y_train)

In [8]:
# Best hyperparameters
best_params = grid_search.best_params_
best_rf_model = grid_search.best_estimator_

# Predictions on the test set
y_pred = best_rf_model.predict(X_test_scaled)

# Model evaluation
r2 = r2_score(y_test, y_pred)

# Display the results
print(f"Best Hyperparameters: {best_params}")
print(f"R-squared (r2) on Test Set: {r2:.4f}")

Best Hyperparameters: {'max_depth': None, 'n_estimators': 200}
R-squared (r2) on Test Set: 0.8923
