## CSC 580: Critical Thinking 2 - Predicting Future Sales
In a nutshell, *sales_data_test.csv* and *sales_data_test.csv* contain data that will be used to train a neural network to predict how much money can be expected form the future sale of new video games. The .csv files were retrieved from one of [Toni Esteves repos](https://github.com/toniesteves/adam-geitgey-building-deep-learning-keras/tree/master/03). 

The columns in the data are defined as follows:
- critic_rating : an average rating out of five stars
- is_action : tells us if this was an action game
- is_exclusive_to_us : tells us if we have an exclusiv deal to sell this game
- is_portable : tells us if this game runs on a handheld video game system
- is_role_playing : tells us if this is a role-playing game
- is_sequel : tells us if this game was a sequel to an earlier video game and part of an ongoing series
- is_sports : tells us if this was a sports game
- suitable_for_kids : tells us if this game is appropriate for all ages
- total_earning : tells us how much money the store has earned in total from selling the game to all customers
- unit_price : tells us for how much a single copy of the game retailed

In [22]:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
import tensorflow as tf
from keras.models import Sequential
from keras import layers 
from keras import activations
from keras import losses

#### Step 1: Prepare the Dataset
The numerical data needs to be scaled for better network training

In [10]:
# Load the training and testing data
train_data = pd.read_csv("sales_data_training.csv", dtype=float)
test_data = pd.read_csv("sales_data_test.csv", dtype = float)

# Scale the data using sklearn
scaler = MinMaxScaler(feature_range=(0,1))
train_data_scaled = scaler.fit_transform(train_data)
test_data_scaled = scaler.fit_transform(test_data)

# Print out adjustment
print("Note: total_earnings values were scaled by multiplying by {:.10f} and adding {:.6f}".format(scaler.scale_[8], scaler.min_[8]))

# Create new DataFrames
df_train_scaled = pd.DataFrame(train_data_scaled, columns=train_data.columns.values)
df_test_scaled = pd.DataFrame(test_data_scaled, columns=test_data.columns.values)

# Save scaled data
df_train_scaled.to_csv("sales_data_training_scaled.csv", index=False)
df_test_scaled.to_csv("sales_data_testing_scaled.csv", index=False)


Note: total_earnings values were scaled by multiplying by 0.0000042367 and adding -0.153415


#### Part 2: Coding the Network

In [15]:
# Load the training data
training_data_df = pd.read_csv("sales_data_training_scaled.csv")

X = training_data_df.drop('total_earnings', axis=1).values
Y = training_data_df[['total_earnings']].values

In [16]:
# Model definition
model = Sequential(
    [
        layers.Input((9,)),
        layers.Dense(50, activation=activations.relu,),
        layers.Dense(100, activation=activations.relu),
        layers.Dense(50, activation=activations.relu),
        layers.Dense(1, activation=activations.linear)
    ]
)

model.compile('adam', losses.mean_squared_error)

#### Part 3: Training the Network

In [17]:
model.fit(X,Y,batch_size = 100, epochs = 50, verbose=2, shuffle=True)

Epoch 1/50
10/10 - 0s - loss: 0.0337
Epoch 2/50
10/10 - 0s - loss: 0.0112
Epoch 3/50
10/10 - 0s - loss: 0.0052
Epoch 4/50
10/10 - 0s - loss: 0.0027
Epoch 5/50
10/10 - 0s - loss: 0.0016
Epoch 6/50
10/10 - 0s - loss: 9.2507e-04
Epoch 7/50
10/10 - 0s - loss: 6.6266e-04
Epoch 8/50
10/10 - 0s - loss: 4.9317e-04
Epoch 9/50
10/10 - 0s - loss: 4.1348e-04
Epoch 10/50
10/10 - 0s - loss: 3.4086e-04
Epoch 11/50
10/10 - 0s - loss: 2.8680e-04
Epoch 12/50
10/10 - 0s - loss: 2.3511e-04
Epoch 13/50
10/10 - 0s - loss: 2.0261e-04
Epoch 14/50
10/10 - 0s - loss: 1.7269e-04
Epoch 15/50
10/10 - 0s - loss: 1.5180e-04
Epoch 16/50
10/10 - 0s - loss: 1.3954e-04
Epoch 17/50
10/10 - 0s - loss: 1.1859e-04
Epoch 18/50
10/10 - 0s - loss: 1.0473e-04
Epoch 19/50
10/10 - 0s - loss: 9.4470e-05
Epoch 20/50
10/10 - 0s - loss: 8.5420e-05
Epoch 21/50
10/10 - 0s - loss: 7.5598e-05
Epoch 22/50
10/10 - 0s - loss: 6.8648e-05
Epoch 23/50
10/10 - 0s - loss: 6.2734e-05
Epoch 24/50
10/10 - 0s - loss: 5.8661e-05
Epoch 25/50
10/10 - 0

<keras.callbacks.History at 0x7fcfaa030700>

#### Part 4 : Evaluating and Saving the Network

In [18]:
testing_data_df = pd.read_csv("sales_data_testing_scaled.csv")

X_TEST = testing_data_df.drop('total_earnings', axis=1).values
Y_TEST = testing_data_df[['total_earnings']].values

In [19]:
test_error_rate = model.evaluate(X_TEST, Y_TEST, verbose=2)
print("The mean squared error (MSE) for the test data set is: {}".format(test_error_rate))

13/13 - 0s - loss: 0.0015
The mean squared error (MSE) for the test data set is: 0.0014870389131829143


In [20]:
model.save("trained_model.h5")

#### Part 5: Making Predictions

In [25]:
X_PRED = pd.read_csv("proposed_new_product.csv").values
model = tf.keras.models.load_model("trained_model.h5")
pred = model.predict(X_PRED)

# Scale
pred = pred[0][0]
pred = (pred + scaler.min_[8]) / scaler.scale_[8]

print("Earnings Prediction for Proposed Product - ${}".format(pred))

Earnings Prediction for Proposed Product - $168958.729593277
