# Capstone Project

> **Copyright Notice**    
> 
> This IPython notebook is part of the **Deep Dive into Data Science** training program at Nexteer Automotive.  
> It incorporates materials from **Coursera**'s **Deep Learning Specialization**, **TensorFlow: Advanced Techniques Specialization**, and **Mathematics for Machine Learning and Data Science** Specialization, licensed under the **Creative Commons Attribution-ShareAlike 2.0 (CC BY-SA 2.0)**, as well as other sources (including, but not limited to, enhancements developed with the assistance of generative AI tools).  
> All original content created for this program, and all adaptations of source materials, are the intellectual property of Nexteer Automotive and are licensed under the same **Creative Commons Attribution-ShareAlike 2.0 (CC BY-SA 2.0)** license.

## $1.$ Dataset & Objective

### $1.1$ [FordA Dataset](https://www.timeseriesclassification.com/description.php?Dataset=FordA) 

This data was originally used in a competition in the IEEE World Congress on Computational Intelligence, 2008. The classification problem is to diagnose whether a certain symptom exists or does not exist in an automotive subsystem. Each case consists of 500 measurements of engine noise and a classification. There are two separate problems: For FordA the Train and test data set were collected in typical operating conditions, with minimal noise contamination.

| Train Size | Test Size | Length | Number of Classes | Number of Dimensions | Type | 
| -------- | -------- | -------- | -------- | -------- | -------- | 
| 3601  | 1320 | 500 | 2 | 1 | SENSOR |

In [None]:
# Code source: Timeseries classification from scratch - Keras Code Examples 
# Link: https://keras.io/examples/timeseries/timeseries_classification_from_scratch/
import numpy as np

def readucr(filename):
    data = np.loadtxt(filename, delimiter="\t")
    y = data[:, 0]
    x = data[:, 1:]
    return x, y.astype(int)

root_url = "https://raw.githubusercontent.com/hfawaz/cd-diagram/master/FordA/"

x_train, y_train = readucr(root_url + "FordA_TRAIN.tsv")
x_test, y_test = readucr(root_url + "FordA_TEST.tsv")

### $1.2$ Objective

Your objective is to develop a binary classifier that can effectively classify FordA time series dataset. Minimum performance for your choice of metric(s) (i.e., accuracy, precision, recall, F1, etc.) is `90%` on the testing set. **Transfer learning is not premitted**.

A more challenging goal would be exceeding `93%` testing performance while maintaining a generalization gap (overfitting) of less than `4%`.

**Note:** While the state-of-the-art (SOTA) performance on this dataset surpasses these targets, your primary focus should be on solving the problem by applying the techniques you've learned—not outperforming the SOTA.


## $2.$ Model Development

### $2.1$ Data Preparation

* Research what type of data preparation is appropriate for time series data
* Prepare data for model development as needed

**Hint**: Minimal data preparation is needed for this dataset!

### $2.2$ Traning & Validation

Follow the Model Development Workflow (Module 3 - Section 4). Please make sure you:
* Define appropriate metric(s) and explain the rationale behind choosing them
* Define a baseline for model development
* Devolop a deep Convolutional Neural Network with `Conv1D` layers using functional API (you may use other layer types such as `LSTM` or `GRU` in combination with `Conv1D` but not without it!)
* Convert your best model to a class object using Subclassing API
* Use a custom training loop and graph-mode excution mode and see how much you can improve training efficiency

**Hint**: Try to implement the architectures that you have studied but using `Conv1D` instead `Conv2D`. 

## $3.$ Delivering Results

### $3.1$ High-Level Presentation

Put together a high level presentation for a not-so-technical audience that includes the following:  
* Problem Statement (ML Objective)
* Dataset overview
* Evaluation Metrics
* Data Preparation 
* Baseline Model Performance
* Explored Model Architectures
* Best Model Performance
    * Architecture
    * Metric scores (train-dev & test)
    * Outline of advanced techniques utilized

### $3.2$ Technical Discussion & Code Review

* Use a copy of this notebook for model development
* Prepare for a code walk-through session

Good luck!