# Neural-Network-Regression-with-California-Housing.

In [182]:
from sklearn.datasets import fetch_california_housing
from keras.models import Sequential
from keras.layers import Dense, Flatten
from keras.utils import to_categorical
import pandas as pd
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import numpy as np
from keras.utils import to_categorical

## What changes if I use relu vs tanh activations in hidden layers?

First of all I will give you the big picture the Activations fanctions like ReLU and tanh: <br>

**What are they exactly?** <br>
<br>
They are functions that work as checkpoints (neurons) and pass the signals (numerical outputs) between them. Each neuron first calculates a weighted sum of its inputs, and then the activation function transforms this value into an output. For example, if the raw signal is 1, the tanh function will squash it to about 0.76, while ReLU will let it pass through unchanged. In this way, activation functions control how much of the signal continues from one neuron to the next until the final output.

#### Answer the question:

The changes will be in the activation value of the neuron:<br>
<br>
**if i use ReLU:** the activation value of the neuron will be always positive, and the value unchange will be same for the raw signal. <br>
<br>
**if i use tanh:** the the activation value of the neuron will be squashed between the range of (-1 to 0) the smallest value will be near to -1 and the largest value will be near to 1, they are not come to exact 1.

## Why should the output layer use a linear activation instead of sigmoid?

Depend on the type of problems or requirenment we choose the type of the activation, and we have two types of activation: <br>
<br>
**1. Regression Activation.** <br>
- We use this type of activation if we need to predict the continuous values which are larger than 1.
<br>
<br>

**2. Classification Activation.** <br>
- We use this type of activation if our output is probability and category

#### Answer the question:

As I explained above, so if we want to get the prices which is continuous number so the best activation type  is "Linear"

## Do deeper models (more layers) always improve prediction accuracy?

The main point of saying it increases the accuracy is depend on the how complex the data set is, and increasing the hidden layers can be both negative and positive effects: <br>
<br>
If we increase the hidded layers in the complex data that will lead to the better accuracy. Why? because it makes the model learn more patterns within the dataset, and vise versa, if we increase the hidden layers in the dataset which have few rows that will lead to the overfitting rather than learning from the patterns and this willl made the model to memorize and then in testing phase the accuracy will be lower than training.

## How do different metrics (MSE, MAE, RMSE) affect my interpretation of results?

MSE and RMSE highlight big mistakes (RMSE is just easier to read since it’s in the same units as the target), while MAE gives the average error size and is more forgiving of outliers. In practice, MAE tells you the “typical miss,” and RMSE shows how bad the bigger misses can get.

## Coding Time

#### Data Load

In [183]:
data = fetch_california_housing()

In [184]:
X, y = data.data, data.target

#### Data Splitting

In [185]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
y_train = to_categorical(y_train, 10)
y_test  = to_categorical(y_test, 10)

- Good overlap

### Scaling

In [186]:
scaler = StandardScaler() # The job of it it keep the mean = 0 and the Std = 1
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

### Build a regression model with at least two hidden layers.

In [187]:
model = Sequential([
    Flatten(input_shape=(X_train.shape[1],)),
    Dense(64, activation='relu'),
    Dense(32, activation='relu'),
    Dense(1)
])

  super().__init__(**kwargs)


In [188]:
model.summary()

### Try different optimizers (adam, sgd) and compare training time.

##### An optimizer is the method the neural network uses to update its weights during training so that it gets better at predicting. <br>
1. First it assign the random weights in the learnable links. <br>
2. The optimizer adjusts these weights step by step to minimize the loss function (the error between predictions and actual values).

In [192]:
model.compile(optimizer= "adam" , loss="mse", metrics=["mae"])
history = model.fit(X_train, y_train, epochs=5, batch_size=32, validation_split=0.1, verbose=1)

Epoch 1/5
[1m465/465[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - loss: 0.0900 - mae: 0.1801 - val_loss: 0.0900 - val_mae: 0.1800
Epoch 2/5
[1m465/465[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 0.0900 - mae: 0.1800 - val_loss: 0.0900 - val_mae: 0.1800
Epoch 3/5
[1m465/465[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - loss: 0.0900 - mae: 0.1800 - val_loss: 0.0900 - val_mae: 0.1800
Epoch 4/5
[1m465/465[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - loss: 0.0900 - mae: 0.1800 - val_loss: 0.0900 - val_mae: 0.1799
Epoch 5/5
[1m465/465[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - loss: 0.0900 - mae: 0.1800 - val_loss: 0.0900 - val_mae: 0.1804


In [190]:
model.compile(optimizer= "sgd" , loss="mse", metrics=["mae"])
history = model.fit(X_train, y_train, epochs=5, batch_size=5, validation_split=0.1, verbose=1)

Epoch 1/5
[1m2972/2972[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 834us/step - loss: 0.2975 - mae: 0.2117 - val_loss: 0.1074 - val_mae: 0.1939
Epoch 2/5
[1m2972/2972[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 849us/step - loss: 0.0916 - mae: 0.1801 - val_loss: 0.0902 - val_mae: 0.1789
Epoch 3/5
[1m2972/2972[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 796us/step - loss: 0.0902 - mae: 0.1800 - val_loss: 0.0901 - val_mae: 0.1804
Epoch 4/5
[1m2972/2972[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 867us/step - loss: 0.0901 - mae: 0.1800 - val_loss: 0.0901 - val_mae: 0.1797
Epoch 5/5
[1m2972/2972[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 880us/step - loss: 0.0901 - mae: 0.1800 - val_loss: 0.0901 - val_mae: 0.1806


### Report metrics like MSE and MAE on training vs validation sets.

In [191]:
hist_df = pd.DataFrame(history.history)

# Training metrics
train_mse = hist_df["loss"]      # because "loss" = MSE here
train_mae = hist_df["mae"]

# Validation metrics
val_mse = hist_df["val_loss"]    # val_loss = validation MSE
val_mae = hist_df["val_mae"]

# Report last epoch values
print(f"Final Training MSE: {train_mse.iloc[-1]:.4f}")
print(f"Final Training MAE: {train_mae.iloc[-1]:.4f}")
print(f"Final Validation MSE: {val_mse.iloc[-1]:.4f}")
print(f"Final Validation MAE: {val_mae.iloc[-1]:.4f}")

Final Training MSE: 0.0901
Final Training MAE: 0.1800
Final Validation MSE: 0.0901
Final Validation MAE: 0.1806
