**Predicting US Housing Prices at the Zip Code Level Using Google's Population Dynamics Foundation Model and Zillow Data**

## Useful Resources

- [Google's Population Dynamics Foundation Model (PDFM)](https://github.com/google-research/population-dynamics)
- Request access to PDFM embeddings [here](https://github.com/google-research/population-dynamics?tab=readme-ov-file#getting-access-to-the-embeddings)
- Zillow data can be accessed [here](https://www.zillow.com/research/data/)


[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/opengeos/GeoAI-Tutorials/blob/main/docs/PDFM/zillow_home_value.ipynb)

In [None]:
# %pip install leafmap scikit-learn

In [None]:
import os
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.neighbors import KNeighborsRegressor
from leafmap.common import evaluate_model, plot_actual_vs_predicted, download_file

In [None]:
zhvi_url = "https://github.com/opengeos/datasets/releases/download/us/zillow_home_value_index_by_zipcode.csv"
zhvi_file = "data/zillow_home_value_index_by_zipcode.csv"

In [None]:
if not os.path.exists(zhvi_file):
    download_file(zhvi_url, zhvi_file)

In [None]:
zhvi_df = pd.read_csv(zhvi_file, dtype={"RegionName": "string"})
zhvi_df.index = zhvi_df["RegionName"].apply(lambda x: f"zip/{x}")
zhvi_df.head()

In [None]:
embeddings_file_path = "data/zcta_embeddings.csv"
zipcode_embeddings = pd.read_csv(embeddings_file_path).set_index("place")
zipcode_embeddings.head()

In [None]:
data = zhvi_df.join(zipcode_embeddings, how="inner")
data.head()

In [None]:
embedding_features = [f"feature{x}" for x in range(330)]
label = "2024-10-31"

In [None]:
data = data.dropna(subset=[label])

In [None]:
data = data[embedding_features + [label]]
X = data[embedding_features]
y = data[label]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Initialize and train a simple linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

evaluation_df = pd.DataFrame({"y": y_test, "y_pred": y_pred})
# Evaluate the model
metrics = evaluate_model(evaluation_df)
print(metrics)

In [None]:
plot_actual_vs_predicted(evaluation_df, xlim=(0, 3_000_000), ylim=(0, 3_000_000))

In [None]:
k = 5
model = KNeighborsRegressor(n_neighbors=k)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

evaluation_df = pd.DataFrame({"y": y_test, "y_pred": y_pred})
# Evaluate the model
metrics = evaluate_model(evaluation_df)
print(metrics)

In [None]:
plot_actual_vs_predicted(evaluation_df, xlim=(0, 3_000_000), ylim=(0, 3_000_000))