# WMX Anomaly Detecion using DC Diff

**DC diff data gathered by the WMX ethercat master is used to detect anomalies in a time-series pattern.**

In this notebook, we'll build an *LSTM Autoencoder*. (See the following an example of autoencoder model.)

<img src="https://lilianweng.github.io/posts/2018-08-12-vae/autoencoder-architecture.png" width="400">

## Preparation
### Install neccessary Python libraries
Note that the Python environment where this notebook runs should already have **PyTorch** packages.

(To install **PyTorch**, go to https://pytorch.org/get-started/locally/)

In [19]:
!pip install scipy
!pip install pandas
!pip install seaborn
!pip install -U scikit-learn
!pip install -q -U watermark
!pip install datasets
!pip install huggingface-hub
!pip install ipywidgets


Collecting ipywidgets
  Using cached ipywidgets-8.1.5-py3-none-any.whl.metadata (2.3 kB)
Collecting widgetsnbextension~=4.0.12 (from ipywidgets)
  Using cached widgetsnbextension-4.0.13-py3-none-any.whl.metadata (1.6 kB)
Collecting jupyterlab-widgets~=3.0.12 (from ipywidgets)
  Using cached jupyterlab_widgets-3.0.13-py3-none-any.whl.metadata (4.1 kB)
Downloading ipywidgets-8.1.5-py3-none-any.whl (139 kB)
Downloading jupyterlab_widgets-3.0.13-py3-none-any.whl (214 kB)
Downloading widgetsnbextension-4.0.13-py3-none-any.whl (2.3 MB)
   ---------------------------------------- 0.0/2.3 MB ? eta -:--:--
   ---------------------------------------- 2.3/2.3 MB 33.4 MB/s eta 0:00:00
Installing collected packages: widgetsnbextension, jupyterlab-widgets, ipywidgets
Successfully installed ipywidgets-8.1.5 jupyterlab-widgets-3.0.13 widgetsnbextension-4.0.13


### Versions of the installed packages

In [14]:
%reload_ext watermark
%watermark -v -p numpy,pandas,torch,scipy

Python implementation: CPython
Python version       : 3.11.10
IPython version      : 8.29.0

numpy : 2.0.1
pandas: 2.2.3
torch : 2.5.1
scipy : 1.14.1



### Import packages and initialize them

In [15]:
import torch

import copy
import numpy as np
import pandas as pd
import seaborn as sns
from pylab import rcParams
import matplotlib.pyplot as plt
from matplotlib import rc
from sklearn.model_selection import train_test_split

from torch import nn, optim

import torch.nn.functional as F
from datasets import Dataset
from datasets import load_dataset
from huggingface_hub import login


%matplotlib inline
%config InlineBackend.figure_format='retina'

sns.set(style='whitegrid', palette='muted', font_scale=1.2)

HAPPY_COLORS_PALETTE = ["#01BEFE", "#FFDD00", "#FF7D00", "#FF006D", "#ADFF02", "#8F00FF"]

sns.set_palette(sns.color_palette(HAPPY_COLORS_PALETTE))

rcParams['figure.figsize'] = 12, 8

RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)

torch.manual_seed(RANDOM_SEED)

<torch._C.Generator at 0x21905191ed0>

### Download the DC diff dataset from Hugging face and load the dataset
Log in to the hugging face before loading
(You may need a Hugging face account to login)

In [20]:
login()

# Load the dataset
dataset = load_dataset("Jake5/wmxdata") 

hg_df = pd.DataFrame(dataset['train'])
print(hg_df.head())

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Generating train split: 100%|██████████| 100047/100047 [00:00<00:00, 1945918.21 examples/s]


                         Timestamp  DcDiffAvg
0 2024-11-01 04:13:46.454280+09:00     780472
1 2024-11-01 04:13:46.584383+09:00     783309
2 2024-11-01 04:13:46.693861+09:00     785026
3 2024-11-01 04:13:46.803610+09:00     786223
4 2024-11-01 04:13:46.913217+09:00     787620


### Check if CUDA is available and use the CUDA avialble device

In [9]:
print(f"CUDA available={torch.cuda.is_available()}")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

CUDA available=True


# Exploratory Data Analysis

# LSTM Autoencoder

## Data Preprocess the dataset (Normalization)

## Building an LSTM Autoencoder

# Training

## Saving the model

## Loading the model if necessary

## Choosing a threashold

## Normal DC Diff

## Anomalies

## Looking at Examples