<h1 style="font-size:3rem;color:rgb(0, 91, 94);text-align:center;">Machine Learning Project (Keras)</h1>
<hr style=\"border-top: 1px solid rgb(0, 91, 94);\" />

This section of the project has four key requirements, each of which have been satisfied below:

- On the keras website, there is an example of time-series anomaly detection. Re-create this example in a notebook of your own, explaining the concepts.

- Clearly explain each keras function used, referring to the documentation.

- Include an introduction to your notebook, setting the context and describing what the reader can expect as they read down through the notebook.

- Include a conclusion section where you suggest improvements you could make to the analysis in the notebook.

### Introduction

The purpose of this notebook is to recreate the time-series anomaly detection example found on the Keras website, explain the main concepts, and define the purpose of each of the functions used. "Keras is a deep learning API written in Python, running on top of the machine learning platform TensorFlow. It was developed with a focus on enabling fast experimentation" (https://keras.io/about/). For each section of the code below, both the concept of the section is exaplained, as well as any keras functions found within.

### Recreation

In [None]:
import numpy as np
import pandas as pd
from tensorflow import keras
from tensorflow.keras import layers
from matplotlib import pyplot as plt

#### Concept: Setup

The purpose of this section is to import the different tools that will be needed in this example.

Numpy, Pandas, Keras (Layers), and Matplotlib are imported for later use.

#### Keras Functions:

- from tensorflow import keras (imports Keras functionality)
- from tensorflow.keras import layers (imports Keras Layers, basic building blocks of neural networks, https://keras.io/api/layers/)

<hr style=\"border-top: 1px solid rgb(0, 91, 94);\" />

In [None]:
master_url_root = "https://raw.githubusercontent.com/numenta/NAB/master/data/"

df_small_noise_url_suffix = "artificialNoAnomaly/art_daily_small_noise.csv"
df_small_noise_url = master_url_root + df_small_noise_url_suffix
df_small_noise = pd.read_csv(
    df_small_noise_url, parse_dates=True, index_col="timestamp"
)

df_daily_jumpsup_url_suffix = "artificialWithAnomaly/art_daily_jumpsup.csv"
df_daily_jumpsup_url = master_url_root + df_daily_jumpsup_url_suffix
df_daily_jumpsup = pd.read_csv(
    df_daily_jumpsup_url, parse_dates=True, index_col="timestamp"
)

#### Concept: Load the data

The purpose of this section is to load in the data that will be used in this example.

The Numenta Anomaly Benchmark (NAB) dataset is used in this example. It's location is identified via a master url.

Two csv files from this dataset are used. Both being identified by url extension suffix, which is concatenated with the master url when being read in. Pandas is used to read in both csv files, which are stored in Pandas Dataframes (https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html). 

art_daily_small_noise.csv will be used for training.

art_daily_jumpsup.csv will be used for testing.

<hr style=\"border-top: 1px solid rgb(0, 91, 94);\" />

In [None]:
print(df_small_noise.head())

print(df_daily_jumpsup.head())

#### Concept: Quick look at the data

The purpose of this section is to simply print the data to check that it has loaded in correctly.

When printing the data (which has been stored in Pandas Dataframes), the Pandas dataframe function "pandas.DataFrame.head" is called. This function will return the first n rows of the dataframe. As no n value is entered, the first five rows are returned, as five is the default value (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.head.html).

<hr style=\"border-top: 1px solid rgb(0, 91, 94);\" />

In [None]:
fig, ax = plt.subplots()
df_small_noise.plot(legend=False, ax=ax)
plt.show()

#### Concept: Visualize timeseries data without anomalies

The purpose of this section is to visualize the timeseries data without anomalies.

To achieve this, a plot of the data from art_daily_small_noise.csv is created using Matplotlib. This provides a visualisation of the data that will be used for training.

<hr style=\"border-top: 1px solid rgb(0, 91, 94);\" />

In [None]:
fig, ax = plt.subplots()
df_daily_jumpsup.plot(legend=False, ax=ax)
plt.show()

#### Concept: Visualize timeseries data with anomalies

The purpose of this section is to visualize the timeseries data with anomalies.

To achieve this, a plot of the data from art_daily_jumpsup.csv is created using Matplotlib. This provides a visualisation of the data that will be used for testing. We will test if the sudden jump seen in the visualisation will be detected as an anomaly.

<hr style=\"border-top: 1px solid rgb(0, 91, 94);\" />

#### Concept: Prepare training data

In [None]:
# Normalize and save the mean and std we get,
# for normalizing test data.
training_mean = df_small_noise.mean()
training_std = df_small_noise.std()
df_training_value = (df_small_noise - training_mean) / training_std
print("Number of training samples:", len(df_training_value))

#### Concept: Create sequences

In [None]:
TIME_STEPS = 288

# Generated training sequences for use in the model.
def create_sequences(values, time_steps=TIME_STEPS):
    output = []
    for i in range(len(values) - time_steps + 1):
        output.append(values[i : (i + time_steps)])
    return np.stack(output)


x_train = create_sequences(df_training_value.values)
print("Training input shape: ", x_train.shape)

#### Concept: Build a model

In [None]:
model = keras.Sequential(
    [
        layers.Input(shape=(x_train.shape[1], x_train.shape[2])),
        layers.Conv1D(
            filters=32, kernel_size=7, padding="same", strides=2, activation="relu"
        ),
        layers.Dropout(rate=0.2),
        layers.Conv1D(
            filters=16, kernel_size=7, padding="same", strides=2, activation="relu"
        ),
        layers.Conv1DTranspose(
            filters=16, kernel_size=7, padding="same", strides=2, activation="relu"
        ),
        layers.Dropout(rate=0.2),
        layers.Conv1DTranspose(
            filters=32, kernel_size=7, padding="same", strides=2, activation="relu"
        ),
        layers.Conv1DTranspose(filters=1, kernel_size=7, padding="same"),
    ]
)
model.compile(optimizer=keras.optimizers.Adam(learning_rate=0.001), loss="mse")

#### Concept: Train the model

In [None]:
history = model.fit(
    x_train,
    x_train,
    epochs=50,
    batch_size=128,
    validation_split=0.1,
    callbacks=[
        keras.callbacks.EarlyStopping(monitor="val_loss", patience=5, mode="min")
    ],
)

#### Concept: Plot training and validation loss

In [None]:
plt.plot(history.history["loss"], label="Training Loss")
plt.plot(history.history["val_loss"], label="Validation Loss")
plt.legend()
plt.show()

#### Concept: Detecting anomalies

In [None]:
# Get train MAE loss.
x_train_pred = model.predict(x_train)
train_mae_loss = np.mean(np.abs(x_train_pred - x_train), axis=1)

plt.hist(train_mae_loss, bins=50)
plt.xlabel("Train MAE loss")
plt.ylabel("No of samples")
plt.show()

# Get reconstruction loss threshold.
threshold = np.max(train_mae_loss)
print("Reconstruction error threshold: ", threshold)

#### Concept: Compare recontruction

In [None]:
# Checking how the first sequence is learnt
plt.plot(x_train[0])
plt.plot(x_train_pred[0])
plt.show()

#### Concept: Prepare test data

In [None]:
df_test_value = (df_daily_jumpsup - training_mean) / training_std
fig, ax = plt.subplots()
df_test_value.plot(legend=False, ax=ax)
plt.show()

# Create sequences from test values.
x_test = create_sequences(df_test_value.values)
print("Test input shape: ", x_test.shape)

# Get test MAE loss.
x_test_pred = model.predict(x_test)
test_mae_loss = np.mean(np.abs(x_test_pred - x_test), axis=1)
test_mae_loss = test_mae_loss.reshape((-1))

plt.hist(test_mae_loss, bins=50)
plt.xlabel("test MAE loss")
plt.ylabel("No of samples")
plt.show()

# Detect all the samples which are anomalies.
anomalies = test_mae_loss > threshold
print("Number of anomaly samples: ", np.sum(anomalies))
print("Indices of anomaly samples: ", np.where(anomalies))

#### Concept: Plot anomalies

In [None]:
# data i is an anomaly if samples [(i - timesteps + 1) to (i)] are anomalies
anomalous_data_indices = []
for data_idx in range(TIME_STEPS - 1, len(df_test_value) - TIME_STEPS + 1):
    if np.all(anomalies[data_idx - TIME_STEPS + 1 : data_idx]):
        anomalous_data_indices.append(data_idx)
        
df_subset = df_daily_jumpsup.iloc[anomalous_data_indices]
fig, ax = plt.subplots()
df_daily_jumpsup.plot(legend=False, ax=ax)
df_subset.plot(legend=False, ax=ax, color="r")
plt.show()

### Conclusion