Name: Zete Dai

NetID: zd790

This solution is based on the PyTorch course: [Intro to Deep Learning with PyTorch](https://classroom.udacity.com/courses/ud188)

# Importing the Package

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 
import torch # PyTorch package
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.

# Preprocessing the Data

### Split the Data

Because the kaggle submission judgement is wrong for some unknown reason, we have to split the data to test the result for ourselves

In [None]:
# read Kaggle datasets
X_train_old = pd.read_csv('/kaggle/input/career-con-2019/X_train.csv')
y_train_old = pd.read_csv('/kaggle/input/career-con-2019/y_train.csv')
# split X_train
samples = 20
time_series = 128
start_x = X_train_old.shape[0] - samples*time_series
X_train, X_test = X_train_old.iloc[:start_x], X_train_old.iloc[start_x:]
# split y_train
start_y = y_train_old.shape[0] - samples
y_train, y_test = y_train_old.iloc[:start_y], y_train_old.iloc[start_y:]

### Inspect the Data

In [None]:
# Inspect the data
display(X_train, y_train, X_test, y_test)

### Merge the Data

After inspecting the data, we found that the "series_id" is the primary key in y, and it's also the foreign key to y in X.

Usually in deep learning, it's easier for training that each example in X to have a y label, so in order to do that, we merge the X dataframe and y dataframe according to the key "series_id".

In [None]:
# Merge X and y so we have the y label for each row in X
# Because there is no ommitted data (perfect), we don't need to specify how to merge
Xy_train = X_train.merge(y_train, on='series_id')
Xy_test = X_test.merge(y_test, on='series_id')
display(Xy_train, Xy_test)

### Extract the Features

After looking at each column of the Xy dataframe, we can find that there are only 9 actual features for predicting the surface type:'orientation_X', 'orientation_Y', 'orientation_Z', 'orientation_W', 'angular_velocity_X', 'angular_velocity_Y', 'angular_velocity_Z', 'linear_acceleration_X', 'linear_acceleration_Y', 'linear_acceleration_Z'. The other features like 'row_id', 'series_id' and 'measurement_number' are just for indexing.

In [None]:
# Features for predict the surface type
X_columns = ['orientation_X', 'orientation_Y', 'orientation_Z', 'orientation_W', 'angular_velocity_X', 'angular_velocity_Y', 'angular_velocity_Z', 'linear_acceleration_X', 'linear_acceleration_Y', 'linear_acceleration_Z']

It's easier to use the integers to indicate the class so we use dictionaries to encode surface type to integers.

In [None]:
# Use dictionarys to map different surface types to int for easier calculation
encode_dic = {'fine_concrete': 0, 
              'concrete': 1, 
              'soft_tiles': 2, 
              'tiled': 3, 
              'soft_pvc': 4,
              'hard_tiles_large_space': 5, 
              'carpet': 6, 
              'hard_tiles': 7, 
              'wood': 8}

decode_dic = {0: 'fine_concrete',
              1: 'concrete',
              2: 'soft_tiles',
              3: 'tiled',
              4: 'soft_pvc',
              5: 'hard_tiles_large_space',
              6: 'carpet',
              7: 'hard_tiles',
              8: 'wood'}

### Convert into PyTorch tensors

In order to work with PyTorch, we have to convert the Pandas dataframes into PyTorch tensors

In [None]:
# Convert pandas dataframes into PyTorch tensors
X_train = torch.tensor(Xy_train[X_columns].values).float()
y_train = torch.tensor(Xy_train['surface'].map(encode_dic).values)
X_test = torch.tensor(Xy_test[X_columns].values).float()
y_test = torch.tensor(Xy_test['surface'].map(encode_dic).values)
display(X_train, y_train, X_test, y_test)

Check if the shapes of X and y are aligned

In [None]:
display(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

# Build and Train the Neural Network

We first have to define our Neural Network, then at each training step (epoch) in deep learning:

1. Do a forward pass (propagation) to get the output of our Neural Network
2. Use the output of our Neural Network to calculate the cost function
3. Use the cost function to do a back propagation to calculate the gradient of the cost function with regard to each weight
4. Use a certain optimization algoritm (gradient descent) to update the weights

Therefore, while defining our Neural Network, we need to:
1. Define the structure of Neural Network
2. Define cost funtion and optimization algorithm (gradient descent)

### Define the Structure of Neural Network


Each measurement has 10 features, and there are 9 classes to predict, so the input layer has 10 neurons and the output layer has 9 output. The number of hidden layers and the number of how many units in each hidden layer is mostly experimental. Except the output layer, we all use the ReLU as the activation function funcion, and we don't use ReLU in output layer is mainly because the cost function we choose (which will be explained in the next section).

Here are the image preview of the structure of our Neural Network:
![NN Structure](https://i.imgur.com/G2k28an.png)
There are 7 layers, the number of units in each layer is 10, 63, 54, 45, 36, 27, 9.

In [None]:
from torch import nn

model = nn.Sequential(nn.Linear(10, 63),
                      nn.ReLU(),
                      nn.Linear(63, 54),
                      nn.ReLU(),
                      nn.Linear(54, 45),
                      nn.ReLU(),
                      nn.Linear(45, 36),
                      nn.ReLU(),
                      nn.Linear(36, 27),
                      nn.ReLU(),
                      nn.Linear(27, 9)
                     )

### Define Cost Funtion and Optimization Algorithm (Gradient Descent)

The cost function we use is Cross Entropy Loss Function, according to the offical document of [nn.CrossEntropyLoss()](https://pytorch.org/docs/stable/nn.html?highlight=cross#torch.nn.CrossEntropyLoss), this criterion combines nn.LogSoftmax() and nn.NLLLoss() in one single class. Therefore, we don't need to use any activation function in the output layer (but if we want to calculate the probabilities of each class, we still have to apply a softmax function to the final output).

The optimization algorithm (gradient descent) we use is [Adam](https://arxiv.org/abs/1412.6980), which is an alternative for batch/normal gradient descent or stochastic gradient descent. Adam "is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters"[1]. The learning rate is mostly experimental.

[1]Kingma, Diederik P., and Jimmy Ba. "Adam: A method for stochastic optimization." arXiv preprint arXiv:1412.6980 (2014).

In [None]:
from torch import optim
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.003)

### Use GPU to Speed Up

We can utilize the GPU to significantly increase the speed of training

In [None]:
# Set the computational device to GPU
device = torch.device("cuda:0")

# Move the neural network model to GPU
model.to(device)

# Move all the tensors to GPU
X_train, y_train, X_test, y_test = X_train.to(device), y_train.to(device), X_test.to(device), y_test.to(device)

### Training the Neural Network

As stated before, at each training step (epoch) in deep learning:
1. Do a forward pass (propagation) to get the output of our Neural Network
2. Use the output of our Neural Network to calculate the cost function
3. Use the cost function to do a back propagation to calculate the gradient of the cost function with regard to each weight
4. Use a certain optimization algoritm (gradient descent) to update the weights

The reason we use [optimizer.zero_grad()](https://pytorch.org/docs/stable/optim.html?highlight=zero_grad#torch.optim.Optimizer.zero_grad) is that PyTorch by default will trace every calculation we do to the input and use these traces to later calculate the gradients of weights, and after each epoch, these gradients will not be reset by default, so we need to manually reset the gradients of weights before forward passing in each epoch.

The training process will take several minutes.

In [None]:
%%time
epochs = 2500

for e in range(epochs):
    
    # Reset the gradients
    optimizer.zero_grad()
    
    # Forward passing
    output = model(X_train)

    # Calculate the loss/cost function
    loss = criterion(output, y_train)
    
    # Back propagation
    loss.backward()
    
    # Update the weights
    optimizer.step()

# Predict the Surfaces

With our Neural Network trained, we can now predict the surface type in the test set and calculate the accuracy.

The reason we use [torch.no_grad()](https://pytorch.org/docs/stable/autograd.html?highlight=no_grad#torch.autograd.no_grad) is that we don't want PyTorch to trace our calculations in predicting process because we don't need to back propagate here.

In [None]:
with torch.no_grad():
    # Calculate the output of Neural Network
    network_output = model(X_test)
    
    # Use the softmax to calculate the probabilities of each class, "dim=1" means across the columns
    possibilities = torch.softmax(network_output, dim=1)
    
    # Use argmax to find the class has the highest probability
    predict = possibilities.argmax(dim=1)
    
    # Compare the predicted result with y_test to find out the accuracy
    acc = (1.0 * (predict==y_test).sum().item() / y_test.shape[0])
    
    print(f"Accuracy = {acc}")