## CSC 580: Critical Thinking 2 - Predicting Future Sales
In a nutshell, *sales_data_test.csv* and *sales_data_test.csv* contain data that will be used to train a neural network to predict how much money can be expected form the future sale of new video games. The .csv files were retrieved from one of [Toni Esteves repos](https://github.com/toniesteves/adam-geitgey-building-deep-learning-keras/tree/master/03). 

The columns in the data are defined as follows:
- critic_rating : an average rating out of five stars
- is_action : tells us if this was an action game
- is_exclusive_to_us : tells us if we have an exclusiv deal to sell this game
- is_portable : tells us if this game runs on a handheld video game system
- is_role_playing : tells us if this is a role-playing game
- is_sequel : tells us if this game was a sequel to an earlier video game and part of an ongoing series
- is_sports : tells us if this was a sports game
- suitable_for_kids : tells us if this game is appropriate for all ages
- total_earning : tells us how much money the store has earned in total from selling the game to all customers
- unit_price : tells us for how much a single copy of the game retailed

In [1]:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
import numpy as np
from keras.models import Sequential
from keras import layers 
from keras import activations
from keras import optimizers
from keras import losses

#### Step 1: Prepare the Dataset
The numerical data needs to be scaled for better network training

In [5]:
# Load the training and testing data
train_data = pd.read_csv("sales_data_training.csv", dtype=float)
test_data = pd.read_csv("sales_data_test.csv", dtype = float)

# Scale the data using sklearn
scaler = MinMaxScaler(feature_range=(0,1))
train_data_scaled = scaler.fit_transform(train_data)
test_data_scaled = scaler.fit_transform(test_data)

# Print out adjustment
print("Note: total_earnings values were scaled by multiplying by {:.10f} and adding {:.6f}".format(scaler.scale_[8], scaler.min_[8]))

# Create new DataFrames
df_train_scaled = pd.DataFrame(train_data_scaled, columns=train_data.columns.values)
df_test_scaled = pd.DataFrame(test_data_scaled, columns=test_data.columns.values)

# Save scaled data
df_train_scaled.to_csv("sales_data_training_scaled.csv", index=False)
df_test_scaled.to_csv("sales_data_testing_scaled.csv", index=False)


Note: total_earnings values were scaled by multiplying by 0.0000042367 and adding -0.153415


#### Part 2: Coding the Network

In [2]:
# Load the training data
training_data_df = pd.read_csv("sales_data_training_scaled.csv")

X = training_data_df.drop('total_earnings', axis=1).values
Y = training_data_df[['total_earnings']].values

In [3]:
# Model definition
model = Sequential(
    [
        layers.Input((9,)),
        layers.Dense(50, activation=activations.relu,),
        layers.Dense(100, activation=activations.relu),
        layers.Dense(50, activation=activations.relu),
        layers.Dense(1, activation=activations.linear)
    ]
)

model.compile('adam', losses.mean_squared_error)

2022-02-21 13:38:07.705544: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


#### Part 3: Training the Network

In [4]:
model.fit(X,Y,batch_size = 100, epochs = 50, verbose=2, shuffle=True)

2022-02-21 13:38:10.824220: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)


Epoch 1/50
10/10 - 0s - loss: 0.0388
Epoch 2/50
10/10 - 0s - loss: 0.0145
Epoch 3/50
10/10 - 0s - loss: 0.0065
Epoch 4/50
10/10 - 0s - loss: 0.0032
Epoch 5/50
10/10 - 0s - loss: 0.0018
Epoch 6/50
10/10 - 0s - loss: 0.0013
Epoch 7/50
10/10 - 0s - loss: 9.2598e-04
Epoch 8/50
10/10 - 0s - loss: 7.0474e-04
Epoch 9/50
10/10 - 0s - loss: 5.6132e-04
Epoch 10/50
10/10 - 0s - loss: 4.2288e-04
Epoch 11/50
10/10 - 0s - loss: 3.4056e-04
Epoch 12/50
10/10 - 0s - loss: 2.7659e-04
Epoch 13/50
10/10 - 0s - loss: 2.3773e-04
Epoch 14/50
10/10 - 0s - loss: 2.0718e-04
Epoch 15/50
10/10 - 0s - loss: 1.6413e-04
Epoch 16/50
10/10 - 0s - loss: 1.3969e-04
Epoch 17/50
10/10 - 0s - loss: 1.2719e-04
Epoch 18/50
10/10 - 0s - loss: 1.1236e-04
Epoch 19/50
10/10 - 0s - loss: 9.7266e-05
Epoch 20/50
10/10 - 0s - loss: 9.2602e-05
Epoch 21/50
10/10 - 0s - loss: 8.3568e-05
Epoch 22/50
10/10 - 0s - loss: 7.7755e-05
Epoch 23/50
10/10 - 0s - loss: 7.8229e-05
Epoch 24/50
10/10 - 0s - loss: 7.2086e-05
Epoch 25/50
10/10 - 0s - 

<keras.callbacks.History at 0x7fcf99027f40>

#### Part 4 : Evaluating the Network

In [5]:
testing_data_df = pd.read_csv("sales_data_testing_scaled.csv")

X_TEST = testing_data_df.drop('total_earnings', axis=1).values
Y_TEST = testing_data_df[['total_earnings']].values

In [6]:
test_error_rate = model.evaluate(X_TEST, Y_TEST, verbose=2)
print("The mean squared error (MSE) for the test data set is: {}".format(test_error_rate))

13/13 - 0s - loss: 0.0012
The mean squared error (MSE) for the test data set is: 0.001168126822449267
