# implement autoencoder based on lstm, cnn and dense layers

When working with time series data, using dense layers (fully connected layers) with an input size equal to the sequence length is generally not ideal for several reasons:

1. Loss of Temporal Dependencies
Time series data has an inherent sequential nature where the order and timing of data points matter. Dense layers treat each input feature independently, ignoring the temporal dependencies between consecutive time points. This can lead to poor performance in capturing the patterns and relationships that are crucial in time series data.

2. Fixed Input Size
Dense layers require a fixed input size, meaning the model architecture is rigid and cannot easily adapt to sequences of varying lengths. In many real-world applications, time series data can have varying lengths, necessitating a more flexible approach.

3. Parameter Inefficiency
Dense layers with a large input size result in a vast number of parameters, making the model computationally expensive and prone to overfitting, especially when the sequence length is long. This inefficiency can be a significant drawback when working with high-dimensional time series data.

4. Ineffective for Long-Term Dependencies
Dense layers are not well-suited for capturing long-term dependencies within the data. Techniques like Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM) networks, and Gated Recurrent Units (GRUs) are specifically designed to handle sequences and can remember information over long periods, making them more effective for time series analysis.

5. Lack of Temporal Feature Extraction
Dense layers do not have mechanisms to extract meaningful temporal features like trends and seasonal patterns inherent in time series data. Convolutional Neural Networks (CNNs) with temporal convolutions or RNNs/LSTMs/GRUs can better capture these features by applying convolutional or recurrent operations over the sequence.

Alternative Approaches
For these reasons, the following approaches are generally preferred for time series data:

Recurrent Neural Networks (RNNs): Capture temporal dependencies by maintaining a hidden state that gets updated at each time step.
Long Short-Term Memory (LSTM) Networks: Address the vanishing gradient problem of RNNs and can capture long-term dependencies.
Gated Recurrent Units (GRUs): A simplified version of LSTMs that can also handle long-term dependencies.
Convolutional Neural Networks (CNNs): Can be applied to time series data to capture local temporal patterns through temporal convolutions.
Temporal Convolutional Networks (TCNs): Use dilated convolutions to capture long-range dependencies while maintaining computational efficiency.
These approaches are designed to leverage the sequential nature of time series data, leading to better performance in tasks such as forecasting, classification, and anomaly detection.

In [1]:
# LSTM Autoencoders 

In [2]:
import tensorflow as tf

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
            tf.config.experimental.set_virtual_device_configuration(
                gpu,
                [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)])  # Set the memory limit as needed
    except RuntimeError as e:
        print(e)


2024-07-22 16:32:40.074853: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-22 16:32:40.106728: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-22 16:32:40.115605: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-07-22 16:32:40.137424: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
I0000 00:00:1721658762.507714   98288 cuda_executor.c

In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import IsolationForest
import numpy as np
import tensorflow as tf
from tensorflow import keras
import pandas as pd
import seaborn as sns
from pylab import rcParams
import matplotlib.pyplot as plt
from matplotlib import rc
from pandas.plotting import register_matplotlib_converters

In [4]:
data = pd.read_csv("times_series_data_no_labels.csv" ,
    index_col='datetime',
    parse_dates=['datetime']
    )

data = data.asfreq('5min')
data.describe()

Unnamed: 0,data_0,data_1
count,51840.0,51840.0
mean,27.428187,27.427566
std,4.276855,4.281787
min,16.042714,16.342305
25%,23.79225,23.832418
50%,29.712173,29.709107
75%,30.188862,30.189345
max,41.066048,41.122645


In [5]:
data

Unnamed: 0_level_0,data_0,data_1
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1
2023-01-01 00:00:00,21.719925,19.925141
2023-01-01 00:05:00,21.357656,19.671888
2023-01-01 00:10:00,20.178934,19.543689
2023-01-01 00:15:00,19.197688,18.872886
2023-01-01 00:20:00,20.098658,19.599005
...,...,...
2023-06-29 23:35:00,19.636588,20.640584
2023-06-29 23:40:00,20.692796,19.895390
2023-06-29 23:45:00,20.081966,20.584634
2023-06-29 23:50:00,19.956621,20.553717


3- Anomaly Detection with LSTM Autoencoders.

In this method we will depend on the detection using the forecasting by Deep Learning algorithms. In the forecasting methods we depend on predict the next point with the addition of some noise and make comparison of this point and the true point at this timestamp by finding the difference between the two points then add threshold finally find the anomalies by compare the difference of the two points with this threshold (we used the Mean absolute error MAE).

Autoencoders are type of self-supervised learning model which are a neural network that learn from the input data. We use autoencoder because the Principal Component Analysis (PCA), which we used in the previous method we depend on the linear algebra to do the models, but by using autoencoders we depended on the non-linear transformation like by use the activation functions; those non-linearity gives us the ability to go deep in the number of the neural network layers.

Long Short-Term Memory (LSTM) is a type of artificial recurrent neural network (RNN). which are designed to handle sequential data, with the previous step's output being fed as the current step's input.

We apply some dimensionality reduction on our dataset by use encoder to make the dimension small then use the decoder to get it back and that minimize the reconstruction loss. In fact, that will make us lose some information but it gives us the ability to know the main pattern of the information and thought that we could define any information out hits pattern under sone threshold will be outlier.

In [6]:
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from torch.utils.data import DataLoader, TensorDataset

# %matplotlib inline
# %config InlineBackend.figure_format='retina'

# sns.set(style='whitegrid', palette='muted', font_scale=1.5)
# rcParams['figure.figsize'] = 22, 10

RANDOM_SEED = 42

np.random.seed(RANDOM_SEED)
torch.manual_seed(RANDOM_SEED)

# Load and prepare data
# data = pd.read_csv('/kaggle/input/milan-dataset/final_data.csv', parse_dates=['time'], index_col='time')
# data = data.groupby("grid_square").get_group(5056)
train_size = int(len(data) * 0.85)
test_size = len(data) - train_size
train, test = data.iloc[0:train_size], data.iloc[train_size:len(data)]
print(train.shape, test.shape)

# Standardize the data
scaler = StandardScaler()
scaler.fit(train[['data_0']])

# Transform and explicitly cast to float64
train_transformed = scaler.transform(train[['data_0']]).astype('float64')
test_transformed = scaler.transform(test[['data_0']]).astype('float64')

# Assign the transformed values back to the DataFrame
train = pd.DataFrame(train_transformed, columns=['data_0'])
test = pd.DataFrame(test_transformed, columns=['data_0'])

def create_dataset(X, y, time_steps=1):
    Xs, ys = [], []
    for i in range(len(X) - time_steps):
        v = X.iloc[i:(i + time_steps)].values
        Xs.append(v)
        ys.append(y.iloc[i + time_steps])
    return np.array(Xs), np.array(ys)

TIME_STEPS = 30

# reshape to [samples, time_steps, n_features]

X_train, y_train = create_dataset(train[['data_0']], train.data_0, TIME_STEPS)
X_test, y_test = create_dataset(test[['data_0']], test.data_0, TIME_STEPS)

print(X_train.shape)

# # Convert to PyTorch tensors
# X_train = torch.tensor(X_train, dtype=torch.float32)
# y_train = torch.tensor(y_train, dtype=torch.float32)
# X_test = torch.tensor(X_test, dtype=torch.float32)
# y_test = torch.tensor(y_test, dtype=torch.float32)

# print(X_train.size(), X_test.size(), y_train.size(), y_test.size())
# # Create DataLoader
# train_dataset = TensorDataset(X_train, y_train)
# test_dataset = TensorDataset(X_test, y_test)
# train_loader = DataLoader(train_dataset, batch_size=32, shuffle=False)
# test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)


(44064, 2) (7776, 2)
(44034, 30, 1)


In [7]:
y_train.shape

(44034,)

In [8]:
input_shape = (X_train.shape[1], X_train.shape[2])

# Create the model
model = keras.Sequential()

# Add an Input layer
model.add(keras.layers.Input(shape=input_shape))

# Add the rest of the layers
model.add(keras.layers.LSTM(units=64))
model.add(keras.layers.Dropout(rate=0.2))
model.add(keras.layers.RepeatVector(n=X_train.shape[1]))
model.add(keras.layers.LSTM(units=64, return_sequences=True))
model.add(keras.layers.Dropout(rate=0.2))
model.add(keras.layers.TimeDistributed(keras.layers.Dense(units=X_train.shape[2])))
model.compile(loss='mae', optimizer='adam')
model.summary()

I0000 00:00:1721658766.791834   98288 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1721658766.792473   98288 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1721658766.793002   98288 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1721658766.897298   98288 cuda_executor.cc:1015] successful NUMA node read from SysFS ha

In [9]:
history = model.fit(
    X_train, y_train,
    epochs=20,
    batch_size=32,
    validation_split=0.1,
    shuffle=False
)

Epoch 1/20


2024-07-22 16:32:49.790880: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:531] Loaded cuDNN version 8907


[1m1239/1239[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 12ms/step - loss: 0.3262 - val_loss: 0.2794
Epoch 2/20
[1m1239/1239[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 12ms/step - loss: 0.2706 - val_loss: 0.2864
Epoch 3/20
[1m1239/1239[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 11ms/step - loss: 0.2697 - val_loss: 0.2730
Epoch 4/20
[1m1239/1239[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 12ms/step - loss: 0.2717 - val_loss: 0.2827
Epoch 5/20
[1m1239/1239[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 12ms/step - loss: 0.2752 - val_loss: 0.2810
Epoch 6/20
[1m1239/1239[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 12ms/step - loss: 0.2685 - val_loss: 0.2978
Epoch 7/20
[1m1239/1239[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 12ms/step - loss: 0.2616 - val_loss: 0.2856
Epoch 8/20
[1m1239/1239[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 12ms/step - loss: 0.2662 - val_loss: 0.2998
Epoch 9/20
[1m1239

In [None]:
plt.plot(history.history['loss'], label='train')
plt.plot(history.history['val_loss'], label='test')
plt.legend()

In [None]:
X_test_pred = model.predict(X_test)

test_mae_loss = np.mean(np.abs(X_test_pred - X_test), axis=1)
len(test_mae_loss)

In [None]:
sns.distplot(test_mae_loss, bins=50, kde=True)

In [None]:
X_train_pred = model.predict(X_train, verbose=0)
train_mae_loss = np.mean(np.abs(X_train_pred - X_train), axis=1)

plt.hist(train_mae_loss, bins=50)
plt.xlabel('Train MAE loss')
plt.ylabel('Number of Samples');

threshold = np.max(train_mae_loss)
print(f'Reconstruction error threshold: {threshold}')

In [None]:
THRESHOLD = 0.5

test_score_df = pd.DataFrame(index=test[TIME_STEPS:].index)
test_score_df['loss'] = test_mae_loss
test_score_df['threshold'] = THRESHOLD
test_score_df['anomaly'] = test_score_df.loss > test_score_df.threshold
test_score_df['data_0'] = test[TIME_STEPS:].data_0

In [None]:
plt.plot(test_score_df.index, test_score_df.loss, label='loss')
plt.plot(test_score_df.index, test_score_df.threshold, label='threshold')
plt.xticks(rotation=25)
plt.title('test_score_loss vs. threshold')
plt.legend()



In [None]:
anomalies = test_score_df[test_score_df.anomaly == True]
anomalies

In [None]:
test_score_df['anomaly'].value_counts()

In [None]:
scaler.inverse_transform(test[TIME_STEPS:])

In [None]:
anomalies

In [None]:
scaler.inverse_transform(anomalies)

In [None]:
plt.plot(
  test[TIME_STEPS:].index, 
  scaler.inverse_transform(test[TIME_STEPS:]), 
  label='data_0'
)

sns.scatterplot(
    x=anomalies.index.to_numpy(),
    y=scaler.inverse_transform(anomalies[["data_0"]]).reshape(-1),
    color=sns.color_palette()[3],
    s=52,
    label='anomaly'
)
plt.xticks(rotation=25)
plt.title('Anomalies')
plt.legend();

In [None]:
# months = data.index.to_period('M').unique()
# import plotly.subplots as sp
# import plotly.graph_objs as go

# plots = []
# for month in months:
#     monthly_data = data[data.index.to_period('M') == month]

#     fig = sp.make_subplots(rows=1, cols=1, shared_xaxes=True, 
#                            subplot_titles=('data_0', 'data_1', 'diff'),
#                            vertical_spacing=0.03, horizontal_spacing=0.02)

#     fig.add_trace(go.Scatter(x=monthly_data.index, y=monthly_data['data_0'], name='data_0', mode='lines'), row=1, col=1)
#     # fig.add_trace(go.Scatter(x=monthly_data.index, y=monthly_data['data_1'], name='data_1', mode='lines'), row=2, col=1)
#     # fig.add_trace(go.Scatter(x=monthly_data.index, y=monthly_data['diff'], name='diff', mode='lines'), row=3, col=1)

#     # Find the indices where the difference exceeds the threshold
#     anomaly_indices = monthly_data[monthly_data['anomalies']].index

#     # Add markers for anomalies
#     fig.add_trace(go.Scatter(x=anomaly_indices, y=monthly_data.loc[anomaly_indices, 'data_0'], 
#                              mode='markers', name='Anomaly data_0', 
#                              marker=dict(color='red', size=10)), row=1, col=1)
    
#     # fig.add_trace(go.Scatter(x=anomaly_indices, y=monthly_data.loc[anomaly_indices, 'data_1'], 
#     #                          mode='markers', name='Anomaly data_1', 
#     #                          marker=dict(color='blue', size=10)), row=2, col=1)
    
#     fig.update_layout(title_text=f"Data for {month}")
#     plots.append(fig)

# # Display the plots
# for plot in plots:
#     plot.show()