# Random Forest Regression

## Healthcare Regression Project

### Problem Statement
Predict disease progression scores in diabetic patients using medical features.

### Dataset
Scikit-learn Diabetes Dataset with 10 clinical predictors.

### ML Pipeline
Data → EDA → Scaling → Training → Evaluation → Visualization

--------------------------------------------------

## Model Definition
Ensemble of decision trees improving accuracy and robustness.

In [None]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
sns.set()


## Step 1: Load Dataset

In [None]:

data = load_diabetes(as_frame=True)
df = data.frame
df.head()


## Step 2: Exploratory Data Analysis

In [None]:

df.describe()

plt.figure(figsize=(6,4))
sns.histplot(df['target'], kde=True)
plt.title("Target Distribution")
plt.show()

plt.figure(figsize=(10,6))
sns.heatmap(df.corr(), cmap='coolwarm')
plt.title("Feature Correlation")
plt.show()


## Step 3: Train-Test Split and Scaling

In [None]:

X = df.drop('target', axis=1)
y = df['target']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)


## Step 4: Model Training

In [None]:

from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)


## Step 5: Evaluation

In [None]:

y_pred = model.predict(X_test)

mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)

print("MAE:", mae)
print("MSE:", mse)
print("RMSE:", rmse)
print("R2 Score:", r2)

plt.figure(figsize=(6,4))
plt.scatter(y_test, y_pred)
plt.xlabel("Actual")
plt.ylabel("Predicted")
plt.title("Actual vs Predicted")
plt.show()


## Conclusion
This model shows how regression techniques help predict disease progression in healthcare.