# Partial Dependence Plots (PDP) using the Boston Housing dataset:


In [2]:
# Import necessary libraries
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import train_test_split
import plotly.express as px

# Load the California Housing dataset
california = fetch_california_housing()
X = pd.DataFrame(california.data, columns=california.feature_names)
y = california.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a Gradient Boosting Regressor
model = GradientBoostingRegressor(random_state=42)
model.fit(X_train, y_train)

# # Creating an interactive scatter plot to show the relationship between MedInc and House Value
fig = px.scatter(
    X_train, x='MedInc',
    marginal_y="histogram",
    marginal_x="histogram",
    title='Partial Dependence Plot of MedInc on House Value'
)
fig.show()


In the scatter plot above, the `x-axis` represents the Median Income (`MedInc`) feature, while the histograms on the margins show the distribution of `MedInc` and House Value. This plot helps us understand how `MedInc` affects the predicted House Value.


# Summary
In this notebook, we explored Partial Dependence Plots using the California Housing dataset. We learned how to train a model and visualize the relationship between a feature and the target variable. PDPs are crucial for understanding and interpreting machine learning models.
