# Linear Regression 
### Using Neural Networks
- Multi layer perceptron

In [2]:
import tensorflow as tf 
from tensorflow import keras

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

### Download and split the data into training and validation sets

In [12]:
housing = fetch_california_housing()

X_train_full, X_test, y_train_full, y_test = train_test_split(
housing.data, housing.target)

X_train, X_valid, y_train, y_valid = train_test_split(
X_train_full, y_train_full)



### Preprocess the data using StandardScaler()
#### Normalizing data leads to faster convergence of the optimizer i.e gradient descent.
- Sklearn's StandardScaler()
 
    - The main idea is to normalize/standardize (mean = 0 and standard deviation = 1) your 
      features/variables/columns of X before applying machine learning techniques.

    - One important thing that you should keep in mind is that most (if not all) scikit-learn 
      models/classes/functions, expect as input a matrix X with dimensions/shape [number_of_samples, 
      number_of_features]. This is very important. Some other libraries expect as input the inverse.

    - IMPORTNANT: StandardScaler() will normalize the features (each column of X, INDIVIDUALLY !!!) so that each 
      column/feature/variable will have mean = 0 and standard deviation = 1.

    - P.S: I find the most upvoted answer on this page, wrong. I am quoting "each value in the dataset will have 
      the sample mean value subtracted" -- This is not true either correct.
      
#### The idea behind StandardScaler is that it will transform your data such that its distribution will have a mean value 0 and standard deviation of 1. In case of multivariate data, this is done feature-wise (in other words independently for each column of the data). Given the distribution of the data, each value in the dataset will have the mean value subtracted, and then divided by the standard deviation of the whole dataset (or feature in the multivariate case)

In [13]:
scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)
X_valid = scaler.transform(X_valid)
X_test = scaler.transform(X_test)

In [14]:
X_train.shape

(11610, 8)

### Build Sequential Keras model
- loss: mse
- optimizer: Stocahstic Gradient Descent
- output layer: 1 neuron
- output layer no activation function since we are modeling a linear hypothesis 

In [15]:
model = keras.models.Sequential([
        keras.layers.Dense(30, activation="relu", input_shape=X_train.shape[1:]),
        keras.layers.Dense(1)
])
model.compile(loss="mean_squared_error", optimizer="sgd")


### Train the model

In [16]:
history = model.fit(X_train, y_train, epochs=20,
                    validation_data=(X_valid, y_valid))

Train on 11610 samples, validate on 3870 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [17]:
mse_test = model.evaluate(X_test, y_test)
X_new = X_test[:3] # pretend these are new instances
y_pred = model.predict(X_new)

