This Jupyter notebook contains information and research regarding the Keras machine learning API.  
The notebook will explain each Keras function that is used, referring to the documentation on Keras' website:  

Keras Team. Keras: the python deep learning api, 2022a.  
URL https://keras.io/  
Keras Team. Timeseries anomaly detection using an autoencoder,
2022b.  
URL https://keras.io/examples/timeseries/timeseries_anomaly_detection/  

Keras is an open-source software library with a Python interface for artificial neural networks.  
Keras acts as an interface for the TensorFlow library, developed by the Google Brain team for the purposes of machine learning and artificial intelligence.  
Keras was created by Francois Chollet, a software engineer at Google, and the coding is regularly maintained on GitHub.  

Keras was created in order to better understand neural networks, which are a series of algorithms that intend to recognise trends in data which mimic how the human brain would perceive such trends from data.  
Keras allows users to develop neural networks on the Internet, on iOS and Android systems as well as the Java Virtual Machine.

We will provide Keras with data to read from via the Numenta Anomaly Benchmark (NAB) dataset.  

The NAB intends to provide sequences of artificial timeseries data which contains labelled anomalous periods of behaviour, all in order, all timestamped and all possessing single values.  
Timeseries refers to the sequences of numerical data, these were collected at successive times, in this case every 5 minutes for a period of 14 days.  
Timeseries are important for the purposes of predictions, data can be used to better understand results and make further progress in business.  

Anomaly detection means developing a program that can detect any unusually occurring data points in a data set.  
By using machine learning, we can increase the speed of executing a model network built for anomaly detection.  
  
We will aim to build an LSTM Autoencoder to visualise and analyse the Keras data set.  
LSTM (Long Short-Term Memory) is an artificial neural network that processes data and decides with training whether to keep or forget information.  

We will also use csv files:  
First, training is carried out via the small noise csv.  
Second, testing is carried out via the jumps up csv.

In [None]:
import pandas as pd

nab_url = "https://raw.githubusercontent.com/numenta/NAB/master/data/"

# training via small noise
df_small_noise_suffix = "artificialNoAnomaly/art_daily_small_noise.csv"
df_small_noise_url = nab_url + df_small_noise_suffix
df_small_noise = pd.read_csv(
    df_small_noise_url, parse_dates = True, index_col = "timestamp"
)

# testing via jumps up
df_daily_jumpsup_suffix = "artificialWithAnomaly/art_daily_jumpsup.csv"
df_daily_jumpsup_url = nab_url + df_daily_jumpsup_suffix
df_daily_jumpsup = pd.read_csv(
    df_daily_jumpsup_url, parse_dates = True, index_col = "timestamp"
)

# printing the training and testing data
print(df_small_noise.head())
print(df_daily_jumpsup.head())

In [None]:
import matplotlib.pyplot as plt

# plotting timeseries data without anomalies for training
fig, ax = plt.subplots()
df_small_noise.plot(legend = False, ax = ax)
plt.show()

In [None]:
import matplotlib.pyplot as plt

# plotting timeseries data with anomalies for testing
fig, ax = plt.subplots()
df_daily_jumpsup.plot(legend = False, ax = ax)
plt.show()

We will get data values from the training csv file and normalize the value data.  
Normalization refers to reorganising our dataset in order to remove any redundant data, resulting in a more efficient means of storing the data.  

The data values/timesteps occur every 5 minutes for 14 days:  

(24 hours * (12 values per hour)) = 288 timesteps per day:  
(288 timesteps * 14 days) = 4,032 data points across 14 days.

In [None]:
# normalizing by acquiring the mean and standard deviation (std)
training_mean = df_small_noise.mean()
training_std = df_small_noise.std()

df_training_value = (df_small_noise - training_mean) / training_std

print("Number of training samples:", len(df_training_value))

In [None]:
import numpy as np

# Timesteps per day (12 values per hour multiplied by 24 hours)
time_steps = (12 * 24)

# function for generating training sequences for model
def create_sequences(values, time_steps = time_steps):
    output = []
    for i in range(len(values) - time_steps + 1):
        output.append(values[i : (i + time_steps)])
    return np.stack(output)

x_train = create_sequences(df_training_value.values)
print("Training Input Shape:", x_train.shape)

The autoencoder model will take the input of shape and return output of the same shape

In [None]:
import numpy as np
import pandas as pd
from tensorflow import keras
from tensorflow.keras import layers
from matplotlib import pyplot as plt

model = keras.Sequential(
[
    layers.Input(shape = (x_train.shape[1], x_train.shape[2])),
    layers.Conv1D(
        filters = 32, kernel_size = 7, padding = "same", strides = 2, activation = "relu"
    ),
    layers.Dropout(rate = 0.2),
    layers.Conv1D(
        filters = 16, kernel_size = 7, padding = "same", strides = 2, activation = "relu"
    ),
    layers.Conv1DTranspose(
        filters = 16, kernel_size = 7, padding = "same", strides = 2, activation = "relu"
    ),
    layers.Dropout(rate = 0.2),
    layers.Conv1DTranspose(
        filters = 32, kernel_size = 7, padding = "same", strides = 2, activation = "relu"
    ),
    layers.Conv1DTranspose(filters = 1, kernel_size = 7, padding = "same"),
]
)

model.compile(optimizer = keras.optimizers.Adam(learning_rate = 0.001), loss = "mse")
model.summary()

We will use x_train as both the input and output in this reconstruction model.

In [None]:
history = model.fit(
    x_train,
    x_train,
    epochs = 50,
    batch_size = 128,
    validation_split = 0.1,
    callbacks = [
        keras.callbacks.EarlyStopping(monitor = "val_loss", patience = 5, mode = "min")
    ],
)

References:  
https://keras.io/  
https://keras.io/examples/timeseries/timeseries_anomaly_detection/  
https://valueml.com/anomaly-detection-in-time-series-data-using-keras/  
https://towardsdatascience.com/time-series-of-price-anomaly-detection-with-lstm-11a12ba4f6d9  
https://en.wikipedia.org/wiki/Keras  
https://www.simplilearn.com/tutorials/deep-learning-tutorial/what-is-keras  
https://www.investopedia.com/terms/n/neuralnetwork.asp  
