## AISE4010- Assignment3 - Time Series Classification using TCN and Transformer + Hyperparameter Tuning

## Grade: 100 points

### Instructions

#### Follow These Steps before submitting your assignment

1. Complete the notebook.

2. Make sure all plots have axis labels.

3. Once the notebook is complete, `Restart` your kernel by clicking 'Kernel' > 'Restart & Run All'.

4. Fix any errors until your notebook runs without any problems.

5. Submit one completed notebook for the group to OWL by the deadline.

6. Make sure to reference all external code and documentation used.

### Dataset

The dataset is a sample of 46 satellite images, collected in 2006, located in southwestern France near Toulouse. It
is a 24 km × 24 km area and the dataset uses 3 output classes (2 available) for arable soil classification based on the following paper: https://arxiv.org/pdf/1811.10166.

You will be using helper functions below to prepare it for deep learning models.

In [None]:
# Call this helper method by passing in the names of the provided training and test sets' files.
def read_SITS_data(name_file):
    data = pd.read_table(name_file, sep=',', header=None)

    y_data = data.iloc[:,0]
    y = np.asarray(y_data.values, dtype='uint8')
    y[y>1] = 0

    polygonID_data = data.iloc[:,1]
    polygon_ids = polygonID_data.values
    polygon_ids = np.asarray(polygon_ids, dtype='uint16')

    X_data = data.iloc[:,2:]
    X = X_data.values
    X = np.asarray(X, dtype='float32')

    return  X, polygon_ids, y

In [None]:
def custom_feature_scaling(train, test):
    min_per = np.percentile(train, 2, axis=(0,1))
    max_per = np.percentile(train, 100-2, axis=(0,1))

    new_train = (train-min_per)/(max_per-min_per)
    new_test = (test-min_per)/(max_per-min_per)

    return new_train, new_test

### Question 1 - Data Preprocessing (15%)
- Q1.1 Call "read_SITS_data()" for the training set and store the results as X_train, polygon_ids_train, and y_train.
- Q1.2 Call "read_SITS_data()" above for the test set and store the results as X_test, polygon_ids_test, and y_test.
- Q1.3 Reshape the training and test sets.
  - Each set must be reshaped into a 3-D array. The first dimension will be the number of rows of the original set. The second dimension will be int(x / 3), where x is the number of columns of the original set and int() is a casting function. The third dimension will be 3 (number of channels).
- Q1.4 Call "custom_feature_scaling()" with the training and test sets. Save the results as the final sets for use.
- Q1.5 How many entries are in the training set? How many time steps are in each entry? How many features are there for each time step? How many labels for each entry?


*Write your Answer to Q1.5 here:*



### Question2 - Temporal Convolutional Network
- Q2.1 Create a Sequential model for classification. The model should have a TCN layer of size 64, a fully connected layer of size 256, a dropout of 0.3, and a fully connected output layer with Softmax activation (Hint: the logits axis should be on 0). Train the model using the provided dataset for 20 epochs. Use the batch_size of 32, and ADAM optimizer. Print the model summary.
- Q2.2 Train the model with the same parameters, print the model summary and evaluate the model's accuracy on the test set. Print the accuracy.
- Q2.3 Why do we use the Softmax activation on the output layer? In what scenarios does this contrast to using ReLU instead?


*Write your Answer to Q2.3 Here:*




### Question 3 - Transformer Model
- Q3.1 Create a transformer encoder block. It should use MultiHeadAttention for residual connection. The projection layers can be two Conv1D layers, based on number of feed forward dimensions and with kernel sizes of 1.
- Q3.2 Define the model. It should have 4 encoder blocks, each with 256 heads and feed forward dimensions of 4. Add a flatten layer, then a fully connected layer of size 2 and a fully connected output layer.
- Q3.3 Print the model summary, train the model using 50 epochs and a batch size of 32. Evaluate the model accuracy on the test set and print it.


### Question 4 - Hyperparameter Tuning
- Q4.1 Define a search space for the number of neurons in the fully connected layer that follows the flatten layer. The lower bound should be 2, the upper bound should be 16, and it should search every other value in between. Also have the tuner decide whether or not a dropout layer of 0.3 should be added after the aforementioned layer.
- Q4.2 Using GridSearch, search for the best hyperparameters with respect to accuracy over 50 epochs.
- Q4.3 Using the best hyperparameters, rebuild the model and print the model accuracy.

### Question 6 - Discussion (5%)
- Q6.1 Indicate other hyperparameters relevant to transformers that can be tuned.
- Q6.2 What are the advantages and disadvantages of using GridSearch for finding optimal hyperparameters?


*Write your Answer to Q6.1 and Q6.2 Here:*
