# Introduction

An autoencoder is a type of neural network architecture that learns to encode input data into a lower-dimensional representation (encoding) and then decode it back to reconstruct the original input. It consists of two main components: an encoder and a decoder.

- **Encoder**: The encoder component of an autoencoder takes the input data and compresses it into a lower-dimensional representation, also known as the encoding. This encoding typically has fewer dimensions than the original input data, effectively capturing its key features and patterns.

- **Decoder**: The decoder component of an autoencoder takes the encoded representation (output of the encoder) and reconstructs the original input data from it. The decoder tries to produce an output that closely matches the input data, effectively learning to reconstruct the data from its compressed representation.

Autoencoders are trained using unsupervised learning, where the objective is to minimize the reconstruction error between the input data and the reconstructed output. By learning to reconstruct the input data, autoencoders effectively learn a compressed representation of the data while capturing its essential features.

In the context of anomaly detection, autoencoders can be used to detect anomalies by leveraging the reconstruction error. During training, the autoencoder learns to reconstruct normal data with low error. However, anomalies, which deviate significantly from the normal data distribution, are likely to result in higher reconstruction errors.

After training, the autoencoder can be used to reconstruct new data samples. Anomalies are identified by comparing the reconstruction error of each data sample to a predefined threshold. Data samples with reconstruction errors above the threshold are considered anomalies.

In summary, autoencoders can be utilized for anomaly detection by learning to reconstruct normal data and identifying deviations from the normal data distribution based on reconstruction errors. They are particularly effective for detecting anomalies in data with complex patterns and structures.

In this exercise we will train an autoencoder to detect anomalies in the sinusoid signal.

# Exercises

## Exercise 1

Generate the signal, add noise and normalize.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import random
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model

# Generate synthetic sinusoidal signal
t = np.linspace(0, 100, 1000)
signal = np.sin(t) 

# Add anomalies to the signal
anomalies = np.random.normal(loc=0, scale=0.5, size=1000)  # Generate noise

signal_with_anomalies = np.append(signal[0:800], anomalies[0:200])  # Signal with anomalies

# Normalize the signals
max_val = np.max(np.append(anomalies, signal))
signal_normalized = signal / max_val
signal_with_anomalies_normalized = signal_with_anomalies / max_val

## Exercise 2

Generate sequences of the signal with a certain timewindow.
The window should loop over a list and you should collect all these sublists in another list.

Tip: what do you think a good timestep should be for the network to be able to learn the signal? 
Tip: think about the periodicity of the signal.

In [None]:
# Generated training sequences for use in the model.
TIME_STEPS = 60
def create_sequences(values, time_steps):
    output = []
    for i in range(len(values) - time_steps + 1):
        output.append(values[i : (i + time_steps)])
    return np.stack(output) 

signal_normalized = create_sequences(signal_normalized, TIME_STEPS)

## Exercise 3

Create a train and testset.

In [None]:
# Split the data into training and test sets
train_data = signal_normalized[:800]
val_data = signal_normalized[800:]
random.shuffle(train_data)
random.shuffle(val_data)

## Exercise 4

Create the auto encoder.

In [None]:
# Build the autoencoder model
input_layer = Input(shape=(TIME_STEPS,))
encoded = Dense(64, activation='relu')(input_layer)
encoded = Dense(32, activation='relu')(encoded)
encoded = Dense(16, activation='relu')(encoded)
decoded = Dense(32, activation='relu')(encoded)
decoded = Dense(64, activation='relu')(decoded)
decoded = Dense(TIME_STEPS, activation='sigmoid')(decoded)

autoencoder = Model(input_layer, decoded)

# Compile the autoencoder
autoencoder.compile(optimizer='adam', loss='mean_squared_error')

# Train the autoencoder
autoencoder.fit(train_data, train_data, epochs=100, batch_size=32, shuffle=True, validation_data=(val_data, val_data))

## Exercise 5

Use the network to predict the signal with anomalies.

In [None]:
# Add anomalies to the signal
anomalies = np.random.normal(loc=0, scale=0.5, size=1000)  # Generate noise
signal_with_anomalies = np.append(signal[0:800], anomalies[0:200])  # Signal with anomalies
signal_with_anomalies_normalized = signal_with_anomalies / max_val

test_data = create_sequences(signal_with_anomalies_normalized, TIME_STEPS)

# Predict on the test data
predictions = autoencoder.predict(test_data)

# Calculate the reconstruction error
reconstruction_error = np.mean(np.square(predictions - test_data), axis=1)

# Exercise 6

Plot the original signal, the signal with anomalies and the predicted signal.

In [None]:
# Plot the original signal, signal with anomalies, and reconstruction error
plt.figure(figsize=(14, 7))
plt.subplot(3, 1, 1)
plt.title('Original Signal')
plt.plot(signal)

plt.subplot(3, 1, 2)
plt.title('Signal with Anomalies')
plt.plot(signal_with_anomalies)

plt.subplot(3, 1, 3)
plt.title('Reconstruction Error')
plt.plot(reconstruction_error, color='r')

plt.tight_layout()
plt.show()

# Exercise 7 

How would you now detect the anomaly?

In [None]:
# Detect anomalies based on reconstruction error
threshold = np.mean(reconstruction_error) + 2 * np.std(reconstruction_error)  # Adjust the threshold as needed
anomaly_indices = np.where(reconstruction_error > threshold)[0]

print(f"Detected anomalies at indices: {anomaly_indices}")
