##### Loss function for Supervised Learning - 

    1. Regression  - L2-norm
    2. Classification  - Cross-entropy
    
    Target - is desired value at which we are aiming
    
We want our output to be as close as possible to target. In cats vs dogs example, targets would be validated labels we assign.

The Y values are the outputs of our model, the machine learning algorithm aims to find a function of X that outputs values as close to the targets as possible.

Using this new notation, the last function evaluates the accuracy of the outputs regarding the targets.
    

output (Y = f(X))   <-   target T    <-   Accuracy    <-  Loss

#### l2 - norm

Also called as squared loss. Method to calculate l2-norm is least squared method.

L2 norm = Σi(yi -t)**2

#### cross-entropy

= minus the sum of the targets times the natural log of the outputs.

L(y,t) = -Σi ti ln(yi)

#### Optimization Algorithm

Gradient Descent - multivariate generalization of derivative concept. 

The actual optimization process happens when the optimization algorithm varies the models
parameters until the loss function has been minimized in the context of the linear model.

This implies varying W and B, OK, the simplest and the most fundamental optimization algorithm is
the gradient descent.

Let's first consider a non machine learning example to understand the logic behind the gradient descent.

Here's a function F of X equal to five times X squared, plus three times X minus four.

    f(x) = 5x**2+3x-4

Our goal is to find the minimum of this function using the gradient descent methodology.
The first step is to find the first derivative of the function.

In our case it is ten times X plus three.

    f'(x) = 10x+3
    if xo = 4,
    x1 = ?
    xi+1 = xi - learning_rate * f'(xi)
    => xi+1 = 4 - learning_rate * 43


Using the update rule, we can find X2, x3 and so on.

After conducting the update operation long enough, the values will eventually stop updating.
That is the point at which we know we have reached the minimum of the function.
This is because the first derivative of the function is zero when we have reached the GLOBAL minimum.

#### N-parameter gradient descent

xw + b = y  -> model
###### xiw+ b = yi -> ti

L(y,t)      -> loss function

C(y,t)      -> cost function

E(y,t)      -> Error function

Lets look at l2 norm loss

###### loss: L(y,t) = L2 norm = (Σi(yi -t)**2)/2

    Any function that holds the basic property of 
    higher for worse results
    lower for better results
    can be a loss function
    
##### Update Rule

    xi+1 = xi - learning_rate * f'(xi)
    
    wi+1 = wi - learning_rate *deltaw L(y,t)
    
    bi+1 = wi - learning_rate *deltab L(y,t)
    
The first derivative at EXI becomes WSI, plus one equals WSI minus Eita Times the gradient of the loss
function with respect to w I for the weights and 

B.I plus one equals B.I minus Eita times the gradient of the loss function With respect to be for the bias's it is basically the same, but for a matrix W any vector B instead of a number X.


    To minimize loss function means -> loss: L(y,t) = L2 norm = (Σi(yi -t)**2)/2

    To optimize :

    wi+1 = wi - learning_rate *deltaw L(y,t)
    
    bi+1 = wi - learning_rate *deltab L(y,t)

### import relevant libraries

In [26]:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

import keras
import tensorflow as tf

import pandas as pd
from keras.models import Sequential
from keras.layers import *
from keras.models import load_model
from sklearn.preprocessing import MinMaxScaler

keras.__version__

'2.6.0'

In [7]:
# Load training data set from CSV file
training_data_df = pd.read_csv("sales_data_training.csv")

# Load testing data set from CSV file
test_data_df = pd.read_csv("sales_data_test.csv")


In [8]:

# Data needs to be scaled to a small range like 0 to 1 for the neural
# network to work well.
scaler = MinMaxScaler(feature_range=(0, 1))

# Scale both the training inputs and outputs
scaled_training = scaler.fit_transform(training_data_df)
scaled_testing = scaler.transform(test_data_df)


In [9]:
# Print out the adjustment that the scaler applied to the total_earnings column of data
print("Note: total_earnings values were scaled by multiplying by {:.10f} and adding {:.6f}".format(scaler.scale_[8], scaler.min_[8]))

# Create new pandas DataFrame objects from the scaled data
scaled_training_df = pd.DataFrame(scaled_training, columns=training_data_df.columns.values)
scaled_testing_df = pd.DataFrame(scaled_testing, columns=test_data_df.columns.values)


Note: total_earnings values were scaled by multiplying by 0.0000036968 and adding -0.115913


In [10]:

# Save scaled data dataframes to new CSV files
scaled_training_df.to_csv("sales_data_training_scaled.csv", index=False)
scaled_testing_df.to_csv("sales_data_testing_scaled.csv", index=False)

In [12]:
training_data_df = pd.read_csv("sales_data_training_scaled.csv")

X = training_data_df.drop('total_earnings', axis=1).values
Y = training_data_df[['total_earnings']].values


In [13]:
# Define the model
model = Sequential()
model.add(Dense(50, input_dim=9, activation='relu'))
model.add(Dense(100, activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(1, activation='linear'))
model.compile(loss="mean_squared_error", optimizer="adam")

In [14]:
# Train the model
model.fit(
    X,
    Y,
    epochs=50,
    shuffle=True,
    verbose=2 #prints more detailed info
)


Epoch 1/50
32/32 - 0s - loss: 0.0245
Epoch 2/50
32/32 - 0s - loss: 0.0029
Epoch 3/50
32/32 - 0s - loss: 0.0011
Epoch 4/50
32/32 - 0s - loss: 5.4315e-04
Epoch 5/50
32/32 - 0s - loss: 3.3731e-04
Epoch 6/50
32/32 - 0s - loss: 2.3508e-04
Epoch 7/50
32/32 - 0s - loss: 2.0049e-04
Epoch 8/50
32/32 - 0s - loss: 1.3711e-04
Epoch 9/50
32/32 - 0s - loss: 1.1317e-04
Epoch 10/50
32/32 - 0s - loss: 1.0248e-04
Epoch 11/50
32/32 - 0s - loss: 9.6386e-05
Epoch 12/50
32/32 - 0s - loss: 7.0687e-05
Epoch 13/50
32/32 - 0s - loss: 6.5732e-05
Epoch 14/50
32/32 - 0s - loss: 5.3585e-05
Epoch 15/50
32/32 - 0s - loss: 5.1114e-05
Epoch 16/50
32/32 - 0s - loss: 4.7881e-05
Epoch 17/50
32/32 - 0s - loss: 4.1606e-05
Epoch 18/50
32/32 - 0s - loss: 3.9282e-05
Epoch 19/50
32/32 - 0s - loss: 3.7770e-05
Epoch 20/50
32/32 - 0s - loss: 3.7655e-05
Epoch 21/50
32/32 - 0s - loss: 3.2275e-05
Epoch 22/50
32/32 - 0s - loss: 3.2085e-05
Epoch 23/50
32/32 - 0s - loss: 2.9304e-05
Epoch 24/50
32/32 - 0s - loss: 3.5047e-05
Epoch 25/50
3

<keras.callbacks.History at 0x2086b3cd6d0>

In [17]:

# Load the separate test data set
test_data_df = pd.read_csv("sales_data_testing_scaled.csv")

X_test = test_data_df.drop('total_earnings', axis=1).values
Y_test = test_data_df[['total_earnings']].values


test_error_rate = model.evaluate(X_test, Y_test, verbose=0)
print("The mean squared error (MSE) for the test data set is: {}".format(test_error_rate))

The mean squared error (MSE) for the test data set is: 0.0001405855582561344


In [18]:
test_error_rate = model.evaluate(X_test, Y_test, verbose=1)
print("The mean squared error (MSE) for the test data set is: {}".format(test_error_rate))

The mean squared error (MSE) for the test data set is: 0.0001405855582561344


In [19]:
test_error_rate = model.evaluate(X_test, Y_test, verbose=2)
print("The mean squared error (MSE) for the test data set is: {}".format(test_error_rate))

13/13 - 0s - loss: 1.4059e-04
The mean squared error (MSE) for the test data set is: 0.0001405855582561344


In [21]:
# Load the data we make to use to make a prediction
X = pd.read_csv("proposed_new_product.csv").values

# Make a prediction with the neural network
prediction = model.predict(X)


In [22]:
# Grab just the first element of the first prediction (since that's the only have one)
prediction = prediction[0][0]


In [23]:
# Re-scale the data from the 0-to-1 range back to dollars
# These constants are from when the data was originally scaled down to the 0-to-1 range
prediction = prediction + 0.1159
prediction = prediction / 0.0000036968

print("Earnings Prediction for Proposed Product - ${}".format(prediction))


Earnings Prediction for Proposed Product - $264378.67422997503


In [24]:
# Save the model to disk
model.save("trained_model.h5")
print("Model saved to disk.")


Model saved to disk.


In [27]:
model = load_model('trained_model.h5')

X = pd.read_csv("proposed_new_product.csv").values
prediction = model.predict(X)

# Grab just the first element of the first prediction (since we only have one)
prediction = prediction[0][0]

# Re-scale the data from the 0-to-1 range back to dollars
# These constants are from when the data was originally scaled down to the 0-to-1 range
prediction = prediction + 0.1159
prediction = prediction / 0.0000036968

print("Earnings Prediction for Proposed Product - ${}".format(prediction))


Earnings Prediction for Proposed Product - $264378.67422997503
