Ilyas Ustun  

# Neural Network Regression Exercise Solutions

## Objective
In this exercise, you will build and train a simple neural network to predict house prices using the California Housing dataset. You'll learn how to:

1. Load and explore data
2. Prepare data for neural networks
3. Build a simple neural network for regression
4. Train and evaluate the model
5. Visualize results

## Dataset
We'll use the California Housing dataset from sklearn, which contains information about housing districts in California. Our goal is to predict the median house value based on features like location, population, and income.

## Task 1: Import Required Libraries

**Solution Explanation:**
We import all necessary libraries for data handling, model building, and visualization. Each library serves a specific purpose:
- `tensorflow`: For building and training neural networks
- `numpy`: For numerical operations
- `matplotlib`: For creating visualizations
- `sklearn`: For dataset loading, preprocessing, and evaluation metrics

In [None]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import r2_score

print("Libraries imported successfully!")
print(f"TensorFlow version: {tf.__version__}")

## Task 2: Load and Explore the Dataset

**Solution Explanation:**
We load the California Housing dataset and explore its structure to understand what we're working with. This includes checking the shape of data, feature names, and basic statistics.

In [None]:
# Load the California Housing dataset
housing = fetch_california_housing()
X, y = housing.data, housing.target

print("Dataset Information:")
print(f"Features shape: {X.shape}")
print(f"Target shape: {y.shape}")
print(f"Feature names: {housing.feature_names}")
print(f"Target description: {housing.target_names}")
print(f"Dataset description: {housing.DESCR[:200]}...")

In [None]:
# Basic statistics
print("\nBasic Statistics:")
print(f"House prices - Mean: ${np.mean(y):.2f}, Std: ${np.std(y):.2f}")
print(f"Price range: ${np.min(y):.2f} - ${np.max(y):.2f}")
print(f"Number of samples: {len(y)}")
print(f"Number of features: {X.shape[1]}")

In [None]:
# Plot distribution of house prices
plt.figure(figsize=(10, 6))
plt.hist(y, bins=50, alpha=0.7, color='skyblue', edgecolor='black')
plt.xlabel('House Price (in hundreds of thousands)')
plt.ylabel('Frequency')
plt.title('Distribution of House Prices in California Housing Dataset')
plt.grid(True, alpha=0.3)
plt.show()

print("The distribution shows that most houses are priced between $1-3 hundred thousand.")

## Task 3: Prepare the Data

**Solution Explanation:**
Data preparation is crucial for neural networks. We:
1. Split data into training (80%) and testing (20%) sets
2. Normalize features using StandardScaler to ensure all features have similar scales
3. Normalization helps the neural network converge faster and prevents features with larger values from dominating

In [None]:
# Split data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

print("Data split completed:")
print(f"Training samples: {X_train.shape[0]}")
print(f"Testing samples: {X_test.shape[0]}")

In [None]:
# Normalize features using StandardScaler
# This is crucial for neural networks as it helps with convergence
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)  # Fit on training data
X_test_scaled = scaler.transform(X_test)        # Transform test data using same scaler

print("Data normalization completed:")
print(f"Training set shape: {X_train_scaled.shape}")
print(f"Test set shape: {X_test_scaled.shape}")
print(f"Training targets shape: {y_train.shape}")
print(f"Test targets shape: {y_test.shape}")

# Verify normalization (should be close to 0 mean, 1 std)
print(f"\nAfter normalization - Training data mean: {np.mean(X_train_scaled):.6f}")
print(f"After normalization - Training data std: {np.std(X_train_scaled):.6f}")

## Task 4: Build a Simple Neural Network

**Solution Explanation:**
We create a Sequential model with:
- Input layer: 8 features (housing characteristics)
- Hidden layer 1: 50 neurons with ReLU activation
- Hidden layer 2: 25 neurons with ReLU activation
- Output layer: 1 neuron (house price prediction) with no activation for regression

In [None]:
# Create a Sequential model with 2 hidden layers
model = tf.keras.Sequential([
    tf.keras.layers.Dense(50, activation='relu', input_shape=(X_train_scaled.shape[1],)),
    tf.keras.layers.Dense(25, activation='relu'),
    tf.keras.layers.Dense(1)  # Output layer for regression (no activation)
])

print("Model created successfully!")
print("\nModel Architecture:")
print("- Input layer: 8 features (housing characteristics)")
print("- Hidden layer 1: 50 neurons with ReLU activation")
print("- Hidden layer 2: 25 neurons with ReLU activation")
print("- Output layer: 1 neuron (house price prediction)")

In [None]:
# Display model architecture
model.summary()

## Task 5: Compile the Model

**Solution Explanation:**
Model compilation configures the training process:
- **Optimizer**: Adam (adaptive learning rate, works well for most problems)
- **Loss function**: Mean Squared Error (MSE) - standard for regression
- **Metrics**: Mean Absolute Error (MAE) - easier to interpret than MSE

In [None]:
# Configure the model for training
model.compile(
    optimizer='adam',                    # Adaptive learning rate optimizer
    loss='mean_squared_error',          # MSE for regression
    metrics=['mean_absolute_error']     # MAE for easier interpretation
)

print("Model compiled successfully!")
print("\nCompilation settings:")
print("- Optimizer: Adam (adaptive learning rate)")
print("- Loss function: Mean Squared Error (MSE)")
print("- Metrics: Mean Absolute Error (MAE)")

## Task 6: Train the Model

**Solution Explanation:**
Training parameters:
- **Epochs**: 100 (number of complete passes through the training data)
- **Batch size**: 32 (number of samples processed before updating weights)
- **Validation split**: 0.2 (20% of training data used for validation)
- **Verbose**: 1 (show training progress)

In [None]:
# Train the model with validation split
print("Starting training...")
history = model.fit(
    X_train_scaled, y_train,
    epochs=100,
    batch_size=32,
    validation_split=0.2,  # Use 20% of training data for validation
    verbose=1
)

print("\nTraining completed!")

## Task 7: Visualize Training Progress

**Solution Explanation:**
Training visualization helps us understand:
- **Loss curves**: How well the model is learning (decreasing loss is good)
- **Validation vs Training**: Gap indicates overfitting if validation loss is much higher
- **MAE curves**: Mean Absolute Error in actual units (hundreds of thousands of dollars)

In [None]:
# Plot training history
plt.figure(figsize=(15, 5))

# Plot loss
plt.subplot(1, 2, 1)
plt.plot(history.history['loss'], label='Training Loss', linewidth=2)
plt.plot(history.history['val_loss'], label='Validation Loss', linewidth=2)
plt.xlabel('Epoch')
plt.ylabel('Loss (MSE)')
plt.title('Model Loss During Training')
plt.legend()
plt.grid(True, alpha=0.3)

# Plot MAE
plt.subplot(1, 2, 2)
plt.plot(history.history['mean_absolute_error'], label='Training MAE', linewidth=2)
plt.plot(history.history['val_mean_absolute_error'], label='Validation MAE', linewidth=2)
plt.xlabel('Epoch')
plt.ylabel('Mean Absolute Error')
plt.title('Model MAE During Training')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("Training analysis:")
final_train_loss = history.history['loss'][-1]
final_val_loss = history.history['val_loss'][-1]
print(f"Final training loss: {final_train_loss:.4f}")
print(f"Final validation loss: {final_val_loss:.4f}")
print(f"Overfitting indicator: {(final_val_loss - final_train_loss) / final_train_loss * 100:.1f}% higher validation loss")

## Task 8: Test the Model

**Solution Explanation:**
Model evaluation on unseen test data gives us the true performance:
- **Test Loss (MSE)**: Average squared error
- **Test MAE**: Average absolute error in hundreds of thousands of dollars
- **Individual predictions**: Examples of how well the model predicts

In [None]:
# Evaluate model on test set
test_loss, test_mae = model.evaluate(X_test_scaled, y_test, verbose=0)
print(f"Test Results:")
print(f"Test Loss (MSE): {test_loss:.4f}")
print(f"Test MAE: {test_mae:.4f}")
print(f"\nInterpretation:")
print(f"On average, our predictions are off by ${test_mae:.2f} hundred thousand")
print(f"Root Mean Squared Error: ${np.sqrt(test_loss):.4f} hundred thousand")

In [None]:
# Make predictions
predictions = model.predict(X_test_scaled)

# Show first 10 predictions vs actual values
print("First 10 Predictions vs Actual Values:")
print("Predicted | Actual | Error")
print("-" * 30)
for i in range(10):
    error = abs(predictions[i][0] - y_test[i])
    print(f"{predictions[i][0]:8.2f} | {y_test[i]:6.2f} | {error:5.2f}")

## Task 9: Visualize Results

**Solution Explanation:**
Result visualization helps us understand model performance:
- **Scatter plot**: Points close to diagonal line indicate good predictions
- **R² score**: Coefficient of determination (closer to 1.0 is better)
- **Error distribution**: Shows if errors are normally distributed around zero

In [None]:
# Create visualizations
plt.figure(figsize=(15, 5))

# Scatter plot of actual vs predicted values
plt.subplot(1, 2, 1)
plt.scatter(y_test, predictions, alpha=0.5, color='blue')
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--', lw=2)
plt.xlabel('Actual House Prices')
plt.ylabel('Predicted House Prices')
plt.title('Actual vs Predicted House Prices')
plt.grid(True, alpha=0.3)

# Add R² score
r2 = r2_score(y_test, predictions)
plt.text(0.05, 0.95, f'R² = {r2:.3f}', transform=plt.gca().transAxes, 
         bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.8))

# Histogram of prediction errors
plt.subplot(1, 2, 2)
errors = y_test - predictions.flatten()
plt.hist(errors, bins=30, alpha=0.7, color='lightcoral', edgecolor='black')
plt.xlabel('Prediction Error (Actual - Predicted)')
plt.ylabel('Frequency')
plt.title('Distribution of Prediction Errors')
plt.grid(True, alpha=0.3)

# Add statistics
plt.axvline(np.mean(errors), color='red', linestyle='--', linewidth=2, label=f'Mean: {np.mean(errors):.3f}')
plt.axvline(np.median(errors), color='green', linestyle='--', linewidth=2, label=f'Median: {np.median(errors):.3f}')
plt.legend()

plt.tight_layout()
plt.show()

print(f"Model Performance Summary:")
print(f"R² Score: {r2:.3f} (explains {r2*100:.1f}% of variance)")
print(f"Mean Error: {np.mean(errors):.3f} (should be close to 0)")
print(f"Error Standard Deviation: {np.std(errors):.3f}")

## Bonus Challenge: Improved Model

**Solution Explanation:**
Let's try to improve our model with:
- **More neurons**: 100 → 50 → 25 architecture
- **Dropout layers**: Prevent overfitting by randomly setting some neurons to zero
- **Early stopping**: Stop training when validation loss stops improving
- **Learning rate scheduling**: Adjust learning rate during training

In [None]:
print("Building improved model with regularization...")

# Try a deeper network with dropout for regularization
improved_model = tf.keras.Sequential([
    tf.keras.layers.Dense(100, activation='relu', input_shape=(X_train_scaled.shape[1],)),
    tf.keras.layers.Dropout(0.2),  # Dropout for regularization
    tf.keras.layers.Dense(50, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(25, activation='relu'),
    tf.keras.layers.Dense(1)
])

# Compile with a specific learning rate
improved_model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
    loss='mean_squared_error',
    metrics=['mean_absolute_error']
)

improved_model.summary()

In [None]:
# Train with early stopping
early_stopping = tf.keras.callbacks.EarlyStopping(
    monitor='val_loss',
    patience=10,
    restore_best_weights=True
)

print("Training improved model...")
improved_history = improved_model.fit(
    X_train_scaled, y_train,
    epochs=150,
    batch_size=32,
    validation_split=0.2,
    callbacks=[early_stopping],
    verbose=1
)

print(f"\nTraining stopped after {len(improved_history.history['loss'])} epochs")

In [None]:
# Evaluate improved model
improved_test_loss, improved_test_mae = improved_model.evaluate(X_test_scaled, y_test, verbose=0)
improved_predictions = improved_model.predict(X_test_scaled)
improved_r2 = r2_score(y_test, improved_predictions)

print(f"Model Comparison:")
print(f"Original Model  - Test MAE: {test_mae:.4f}, R²: {r2:.4f}")
print(f"Improved Model  - Test MAE: {improved_test_mae:.4f}, R²: {improved_r2:.4f}")
print(f"\nImprovement: {((test_mae - improved_test_mae) / test_mae * 100):.1f}% reduction in MAE")
print(f"R² improvement: {((improved_r2 - r2) / r2 * 100):.1f}% increase")

In [None]:
# Compare training histories
plt.figure(figsize=(15, 5))

plt.subplot(1, 2, 1)
plt.plot(history.history['val_loss'], label='Original Model', linewidth=2)
plt.plot(improved_history.history['val_loss'], label='Improved Model', linewidth=2)
plt.xlabel('Epoch')
plt.ylabel('Validation Loss')
plt.title('Model Comparison - Validation Loss')
plt.legend()
plt.grid(True, alpha=0.3)

plt.subplot(1, 2, 2)
plt.plot(history.history['val_mean_absolute_error'], label='Original Model', linewidth=2)
plt.plot(improved_history.history['val_mean_absolute_error'], label='Improved Model', linewidth=2)
plt.xlabel('Epoch')
plt.ylabel('Validation MAE')
plt.title('Model Comparison - Validation MAE')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Summary Questions - Answers

Here are the answers to help check your understanding:

### 1. What does the loss function measure in regression?

**Answer:** The loss function (MSE - Mean Squared Error) measures the average squared difference between predicted and actual values. It penalizes larger errors more heavily than smaller ones, which encourages the model to avoid making big mistakes.

### 2. Why do we normalize the input features?

**Answer:** Normalization ensures all features are on the same scale (mean=0, std=1), preventing features with larger numerical ranges from dominating the learning process. This helps the neural network converge faster and more reliably.

### 3. What does it mean if the validation loss is much higher than training loss?

**Answer:** This indicates **overfitting** - the model has memorized the training data but doesn't generalize well to new, unseen data. The model performs well on training data but poorly on validation/test data.

### 4. How can you tell if your model is making good predictions from the scatter plot?

**Answer:** Good predictions show points clustered close to the diagonal line (y=x) in the actual vs predicted scatter plot. The R² score close to 1.0 also indicates good performance, meaning the model explains most of the variance in the data.

### 5. What would you try next to improve your model's performance?

**Answer:** Several strategies:
- **Architecture**: Add more layers or neurons, try different activation functions
- **Regularization**: Use dropout, L1/L2 regularization to prevent overfitting
- **Training**: Adjust learning rate, use learning rate scheduling, train longer with early stopping
- **Data**: Feature engineering, collect more data, handle outliers
- **Ensemble**: Combine multiple models for better predictions

## Key Takeaways

🎯 **Main Learning Points:**

1. **Data preprocessing is crucial** - Normalization helps neural networks converge faster and more reliably

2. **Monitor training progress** - Use validation data to detect overfitting and guide training decisions

3. **Multiple evaluation metrics** - MSE, MAE, and R² provide different perspectives on model performance

4. **Visualization is powerful** - Plots help understand model behavior and identify issues

5. **Regularization helps** - Techniques like dropout and early stopping prevent overfitting

6. **Experimentation is key** - Try different architectures, hyperparameters, and techniques to improve performance

**🎉 Exercise completed successfully!** You now have a solid foundation in neural network regression using TensorFlow and Keras.