# üè† Keras + Pandas Tutorial: Real Estate Price Prediction

Learn to combine **Pandas** (data manipulation) with **Keras** (deep learning) to predict house prices.

## Learning Objectives:
1. Load and explore data with Pandas
2. Preprocess data for machine learning
3. Build a neural network with Keras
4. Train, evaluate, and make predictions

In [None]:
# Step 1: Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping
import warnings
warnings.filterwarnings('ignore')
print("Libraries imported!")

In [None]:
# Step 2: Load Data with Pandas
df = pd.read_csv('real_estate_data.csv')
print(f"Dataset: {df.shape[0]} rows, {df.shape[1]} columns")
df.head()

In [None]:
# Basic info and statistics
df.info()
print("\n" + "="*50)
df.describe()

In [None]:
# Step 3: Exploratory Data Analysis
# Check missing values
print("Missing values:", df.isnull().sum().sum())

# Price distribution
df['price'].hist(bins=20, edgecolor='black')
plt.title('House Price Distribution')
plt.xlabel('Price ($)')
plt.show()

In [None]:
# Correlation with price
numerical_cols = df.select_dtypes(include=[np.number]).columns
correlations = df[numerical_cols].corr()['price'].sort_values(ascending=False)
print("Correlation with Price:")
print(correlations)

In [None]:
# Key feature scatter plots
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
axes[0].scatter(df['sqft'], df['price'], alpha=0.5)
axes[0].set_xlabel('Sqft'); axes[0].set_ylabel('Price')
axes[1].scatter(df['bedrooms'], df['price'], alpha=0.5)
axes[1].set_xlabel('Bedrooms'); axes[1].set_ylabel('Price')
axes[2].scatter(df['median_income'], df['price'], alpha=0.5)
axes[2].set_xlabel('Median Income'); axes[2].set_ylabel('Price')
plt.tight_layout()
plt.show()

In [None]:
# Step 4: Data Preprocessing
df_processed = df.copy()
df_processed = df_processed.drop('property_id', axis=1)

# Encode categorical variables
categorical_cols = ['property_type', 'location']
label_encoders = {}
for col in categorical_cols:
    le = LabelEncoder()
    df_processed[col] = le.fit_transform(df_processed[col])
    label_encoders[col] = le
    print(f"{col}: {dict(zip(le.classes_, le.transform(le.classes_)))}")

In [None]:
# Separate features and target
X = df_processed.drop('price', axis=1)
y = df_processed['price']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(f"Training: {X_train.shape[0]} | Testing: {X_test.shape[0]}")

# Feature scaling (IMPORTANT for neural networks!)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
print("Features scaled!")

In [None]:
# Step 5: Build Keras Neural Network
n_features = X_train_scaled.shape[1]

model = Sequential([
    Dense(64, activation='relu', input_shape=(n_features,)),
    Dropout(0.2),
    Dense(32, activation='relu'),
    Dropout(0.2),
    Dense(16, activation='relu'),
    Dense(1)  # Output layer for regression
])

model.compile(optimizer='adam', loss='mse', metrics=['mae'])
model.summary()

In [None]:
# Step 6: Train the Model
early_stop = EarlyStopping(monitor='val_loss', patience=20, restore_best_weights=True)

history = model.fit(
    X_train_scaled, y_train,
    validation_split=0.2,
    epochs=200,
    batch_size=16,
    callbacks=[early_stop],
    verbose=1
)

In [None]:
# Plot training history
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
axes[0].plot(history.history['loss'], label='Train')
axes[0].plot(history.history['val_loss'], label='Val')
axes[0].set_title('Loss'); axes[0].legend()
axes[1].plot(history.history['mae'], label='Train')
axes[1].plot(history.history['val_mae'], label='Val')
axes[1].set_title('MAE'); axes[1].legend()
plt.tight_layout()
plt.show()

In [None]:
# Step 7: Evaluate Model
test_loss, test_mae = model.evaluate(X_test_scaled, y_test, verbose=0)
print(f"Test MAE: ${test_mae:,.2f}")
print(f"Predictions are off by ~${test_mae:,.0f} on average")

In [None]:
# Actual vs Predicted plot
y_pred = model.predict(X_test_scaled).flatten()

plt.figure(figsize=(8, 6))
plt.scatter(y_test, y_pred, alpha=0.6)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--')
plt.xlabel('Actual Price')
plt.ylabel('Predicted Price')
plt.title('Actual vs Predicted Prices')
plt.show()

In [None]:
# Step 8: Make Predictions on New Data
new_house = pd.DataFrame({
    'bedrooms': [4], 'bathrooms': [3], 'sqft': [2500], 'lot_size': [0.35],
    'year_built': [2015], 'garage_spaces': [2], 'property_type': ['Single Family'],
    'location': ['Suburban'], 'has_pool': [1], 'has_fireplace': [1],
    'distance_to_city_km': [15.0], 'school_rating': [8], 'crime_rate': [2.0],
    'median_income': [85000]
})

# Preprocess new data (same as training)
new_processed = new_house.copy()
for col in categorical_cols:
    new_processed[col] = label_encoders[col].transform(new_processed[col])
new_scaled = scaler.transform(new_processed)

# Predict
price = model.predict(new_scaled)[0][0]
print(f"Predicted Price: ${price:,.2f}")

In [None]:
# Step 9: Save Model
model.save('house_price_model.keras')
print("Model saved!")

# Load model (for future use)
from tensorflow.keras.models import load_model
loaded_model = load_model('house_price_model.keras')
print("Model loaded and ready to use!")

## üìù Key Takeaways

### Pandas Functions Used:
- `pd.read_csv()` - Load CSV data
- `df.head()`, `df.info()`, `df.describe()` - Explore data
- `df.drop()` - Remove columns
- `df.select_dtypes()` - Select column types

### Keras Functions Used:
- `Sequential()` - Create model
- `Dense()` - Add neural network layers
- `Dropout()` - Prevent overfitting
- `model.compile()` - Configure training
- `model.fit()` - Train model
- `model.evaluate()` - Test performance
- `model.predict()` - Make predictions