### Codio Activity 23.5: Neural Networks for Regression

**Expected Time = 90 minutes** 

**Total Points = 40** 

This activity focuses on using a neural network to build a model for regression data to predict housing prices.  Most of the work is similar to that from your earlier classification models with the inclusion of different loss functions and output layer geometry.  

#### Index

- [Problem 1](#-Problem-1)
- [Problem 2](#-Problem-2)
- [Problem 3](#-Problem-3)
- [Problem 4](#-Problem-4)

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf

from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential


2024-12-10 17:35:21.129028: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [3]:
from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler

#### The Data

The dataset contains basic information on houses in a given neighborhood in California, USA. The data is loaded and description of the data printed to the screen.  Your goal is to predict the Median House Value (`MedHouseVal`)
for each neighborhood.

In [7]:
houses = fetch_california_housing(as_frame=True)
houses.frame.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,MedHouseVal
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23,4.526
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22,3.585
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24,3.521
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25,3.413
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25,3.422


In [9]:
print(houses.DESCR)

.. _california_housing_dataset:

California Housing dataset
--------------------------

**Data Set Characteristics:**

:Number of Instances: 20640

:Number of Attributes: 8 numeric, predictive attributes and the target

:Attribute Information:
    - MedInc        median income in block group
    - HouseAge      median house age in block group
    - AveRooms      average number of rooms per household
    - AveBedrms     average number of bedrooms per household
    - Population    block group population
    - AveOccup      average number of household members
    - Latitude      block group latitude
    - Longitude     block group longitude

:Missing Attribute Values: None

This dataset was obtained from the StatLib repository.
https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html

The target variable is the median house value for California districts,
expressed in hundreds of thousands of dollars ($100,000).

This dataset was derived from the 1990 U.S. census, using one row per ce

In [11]:
X, y = houses.data, houses.target
X = StandardScaler().fit_transform(X) #scale the data

[Back to top](#-Index)

### Problem 1

#### The Network Architecture

Use the function `Sequential()` to create a neural network `model` with the following architecture:


- A single hidden `Dense` layer with 100 hidden nodes and with `activation` equal to `relu` 
- A single hidden `Dense` layer with 1 unit and with activation equal to  `linear` 

In [15]:
### GRADED
tf.random.set_seed(42)
model = Sequential([
    Dense(100, activation='relu'),
    Dense(1, activation='linear')
])

### ANSWER CHECK
model.layers[0].units

100

[Back to top](#-Index)

### Problem 2

#### Compiling the Network


Use the function `compile` to compile `model` using `mse` as your `loss` and `mse` as your `metric`.


In [17]:
### GRADED
tf.random.set_seed(42)
model.compile(loss='mse', metrics=['mse'])

### ANSWER CHECK
print(model.loss)

mse


[Back to top](#-Index)

### Problem 3

#### Training the model

Use the function `fit()` to fit your `model` to the `X` and `y` data. Set the argument `validation_split` equal to 0.2, the argument `epochs` equal to `20`, and the argument `verbose` to `0`. Assign your result to the variable `history`.


In [21]:
### GRADED
tf.random.set_seed(42)
history = model.fit(X, y, validation_split=0.2, epochs=20, verbose=0)

### ANSWER CHECK
print(history.history['mse'][-1])

0.30591723322868347


[Back to top](#-Index)

### Problem 4

#### Comparing to `LinearRegression`

Compare the performance of the model in terms of mean squared error with that of a `LinearRegression` model on the full dataset `X`, `y`. Assign your result to the variable `lr`.

Finally, use the function `mean_squared_error` with arguments equal to `y` and `lr.predict(X)` to compute your error. Assign the result to the variable `lr_mse`.

In [23]:
from sklearn.metrics import mean_squared_error

In [25]:
### GRADED
lr = LinearRegression().fit(X,y)
lr_mse = mean_squared_error(y, lr.predict(X))

### ANSWER CHECK
print(lr_mse)

0.5243209861846072
