# Unit 3 Assessment

In this assignment, we will focus on sensor data. The dataset contains accelerometer data from cell phones. Accelerometer helps measure the speed and acceleration of a cell phone's movement. Each row represents a single measurement (captured on a timeline). There are a total of 20 time steps (columns). This is a multiclass classification task: predict what type of transportation each measurement (i.e., row) represents based on the accelerometer data. 

## Description of Variables

You will use the **movement.csv** data set for this assignment. Each row represents a single measurement. Columns labeled as 1 from 20 are the time steps on the timeline (there are 20 time steps, each time step has only one measurement). 

The last column is the target variable. It shows the label (category) of the measurement. Because it is a text-based column, **it must be converted to ordinal values.**

## Goal

Use the data set **movement.csv** to predict the column called **Target**. The input variables are columns labeled as **1 to 20**. 

## Submission:

Please save and submit this Jupyter notebook file. The correctness of the code matters for your grade. **Readability and organization of your code is also important.** You may lose points for submitting unreadable/undecipherable code. Therefore, use markdown cells to create sections, and use comments where necessary.


# Read and Prepare the Data (1 points)

In [None]:
import numpy as np
import pandas as pd

np.random.seed(42)

In [None]:
movement = pd.read_csv("movement.csv")
movement

In [None]:
from sklearn.model_selection import train_test_split

train_set, test_set = train_test_split(movement, test_size=0.3)

In [None]:
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OneHotEncoder

from sklearn.preprocessing import FunctionTransformer

In [None]:
train_target = train_set[['Target']]
test_target = test_set[['Target']]

train_inputs = train_set.drop(['Target'], axis=1)
test_inputs = test_set.drop(['Target'], axis=1)

In [None]:
from sklearn.preprocessing import OrdinalEncoder

ord_enc = OrdinalEncoder()

train_y = ord_enc.fit_transform(train_target)

train_y

In [None]:
test_y = ord_enc.transform(test_target)

test_y

In [None]:
train_inputs.shape, test_inputs.shape, train_y.shape, test_y.shape

# Find the baseline (0.5 point)

In [None]:
from sklearn.dummy import DummyClassifier

dummy_clf = DummyClassifier(strategy="most_frequent")

dummy_clf.fit(train_inputs, train_y)

In [None]:
from sklearn.metrics import accuracy_score

In [None]:
#Baseline Train Accuracy
dummy_train_pred = dummy_clf.predict(train_inputs)

baseline_train_acc = accuracy_score(train_y, dummy_train_pred)

print('Baseline Train Accuracy: {}' .format(baseline_train_acc))

In [None]:
#Baseline Test Accuracy
dummy_test_pred = dummy_clf.predict(test_inputs)

baseline_test_acc = accuracy_score(test_y, dummy_test_pred)

print('Baseline Test Accuracy: {}' .format(baseline_test_acc))

# Build a cross-sectional (i.e., a regular) Neural Network model using Keras (with only one hidden layer) (2 points)

In [None]:
import tensorflow as tf
from tensorflow import keras

model = keras.models.Sequential([
    
    keras.layers.Flatten(input_shape=[20, 1]),
    keras.layers.Dense(20, activation='relu'),
    keras.layers.Dense(5, activation='softmax')
    
])

In [None]:
np.random.seed(42)
tf.random.set_seed(42)

optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)

# If multiclass, use "sparse_categorical_crossentropy" as the loss function
model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer, metrics=['accuracy'])


history = model.fit(train_inputs, train_y, epochs=50,
                    validation_data=(test_inputs, test_y))

In [None]:
scores = model.evaluate(test_inputs, test_y, verbose=0)

print("%s: %.2f" % (model.metrics_names[0], scores[0]))
print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))

# Build a deep cross-sectional (i.e., regular) Neural Network model using Keras (with two or more hidden layers) (2 points)

In [None]:
model = keras.models.Sequential()

model.add(keras.layers.Input(shape=20))
model.add(keras.layers.Dense(18, activation='relu'))
model.add(keras.layers.Dense(15, activation='relu'))
model.add(keras.layers.Dense(12, activation='relu'))
model.add(keras.layers.Dense(5, activation='softmax'))

In [None]:
np.random.seed(42)
tf.random.set_seed(42)

optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)

# If multiclass, use "sparse_categorical_crossentropy" as the loss function
model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer, metrics=['accuracy'])


history = model.fit(train_inputs, train_y, epochs=50,
                    validation_data=(test_inputs, test_y))

In [None]:
scores = model.evaluate(test_inputs, test_y, verbose=0)

print("%s: %.2f" % (model.metrics_names[0], scores[0]))
print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))

# Build a LSTM Model (with only one layer) (2 points)

In [None]:
n_steps = 20
n_inputs = 1

model = keras.models.Sequential([
    
    keras.layers.LSTM(20, input_shape=[n_steps, n_inputs]),
    keras.layers.Dense(5, activation='softmax')
])

In [None]:
np.random.seed(42)
tf.random.set_seed(42)

optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)

# If multiclass, use "sparse_categorical_crossentropy" as the loss function
model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer, metrics=['accuracy'])


history = model.fit(train_inputs, train_y, epochs=50,
                    validation_data=(test_inputs, test_y))

In [None]:
np.random.seed(42)
tf.random.set_seed(42)

optimizer = keras.optimizers.Adam(learning_rate=0.01)

model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer, metrics=['accuracy'])

In [None]:
scores = model.evaluate(test_inputs, test_y, verbose=0)

print("%s: %.2f" % (model.metrics_names[0], scores[0]))
print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))

# Build a deep LSTM Model (with only two layers) (2 points)

In [None]:
n_steps = 20
n_inputs = 1

model = keras.models.Sequential([
    keras.layers.LSTM(15, return_sequences=True, input_shape=[n_steps, n_inputs]),
    keras.layers.LSTM(12, return_sequences=True),
    keras.layers.LSTM(7),
    keras.layers.Dense(5, activation='softmax')
])

In [None]:
np.random.seed(42)
tf.random.set_seed(42)

optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)

# If multiclass, use "sparse_categorical_crossentropy" as the loss function
model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer, metrics=['accuracy'])


history = model.fit(train_inputs, train_y, epochs=50,
                    validation_data=(test_inputs, test_y))

In [None]:
scores = model.evaluate(test_inputs, test_y, verbose=0)

print("%s: %.2f" % (model.metrics_names[0], scores[0]))
print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))

# Build a GRU Model (with only one layer) (2 points)

In [None]:
n_steps = 20
n_inputs = 1

model = keras.models.Sequential([
    keras.layers.GRU(18, input_shape=[n_steps, n_inputs]),
    keras.layers.Dense(5, activation='softmax')
])

In [None]:
np.random.seed(42)
tf.random.set_seed(42)

optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)

# If multiclass, use "sparse_categorical_crossentropy" as the loss function
model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer, metrics=['accuracy'])


history = model.fit(train_inputs, train_y, epochs=50,
                    validation_data=(test_inputs, test_y))

In [None]:
scores = model.evaluate(test_inputs, test_y, verbose=0)

print("%s: %.2f" % (model.metrics_names[0], scores[0]))
print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))

# Build a deep GRU Model (with only two layers) (2 points)

In [None]:
n_steps = 20
n_inputs = 1

model = keras.models.Sequential([
    keras.layers.GRU(18, return_sequences=True, input_shape=[n_steps, n_inputs]),
    keras.layers.GRU(16, return_sequences=True),
    keras.layers.GRU(14),
    keras.layers.Dense(5, activation='sigmoid')
])

In [None]:
np.random.seed(42)
tf.random.set_seed(42)

optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)

# If multiclass, use "sparse_categorical_crossentropy" as the loss function
model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer, metrics=['accuracy'])


history = model.fit(train_inputs, train_y, epochs=50,
                    validation_data=(test_inputs, test_y))

In [None]:
scores = model.evaluate(test_inputs, test_y, verbose=0)

print("%s: %.2f" % (model.metrics_names[0], scores[0]))
print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))

# Discussion

## List the test values of each model you built (0.5 points)

## Which model performs the best and why? (0.5 points) 

## How does it compare to baseline? (0.5 points)

# Extra credit: 2 points

The dataset is very small. This means your test values are likely unreliable. Use your best model and run a 10-fold cross validation on it. Then, find and report the mean accuracy score.

Note: to be eligible for this extra credit, you should run your 10-fold cross validation on the unsplit data.

In [None]:
from tensorflow.keras.losses import sparse_categorical_crossentropy
from tensorflow.keras.optimizers import Adam
from sklearn.model_selection import KFold

In [None]:
inputs = np.concatenate((train_inputs, test_inputs), axis=0)
targets = np.concatenate((train_y, test_y), axis=0)

In [None]:
kfold = KFold(n_splits=10, shuffle=True)

In [None]:
fold_no = 10
for train, test in kfold.split(inputs, targets):
    n_steps = 20
    n_inputs = 1

    model = keras.models.Sequential([
        keras.layers.GRU(18, input_shape=[n_steps, n_inputs]),
        keras.layers.Dense(5, activation='softmax')
    ])

In [None]:
model.compile(loss=sparse_categorical_crossentropy,
                optimizer=tf.keras.optimizers.Adam(learning_rate=0.01),
                metrics=['accuracy'])

In [None]:
history = model.fit(inputs[train], targets[train],
              batch_size=50,
              epochs=100,
              verbose=0)

In [None]:
scores = model.evaluate(inputs[test], targets[test], verbose=0)

In [None]:
scores