Classification and Regression Trees (CART)
We can utilize the power and robustness of Decision Trees to identify outliers/anomalies in time series data.

First, you can use supervised learning to teach trees to classify anomaly and non-anomaly data points. In order to do that, we’d need to have labeled anomaly data points, which you won’t find often outside of toy datasets.
Unsupervised is what you need! We can use the Isolation Forest algorithm to predict whether a certain point is an outlier or not, without the help of any labeled dataset. Let’s see how.
The main idea, which is different from other popular outlier detection methods, is that Isolation Forest explicitly identifies anomalies instead of profiling normal data points. Isolation Forest, like any tree ensemble method, is based on decision trees.

In other words, Isolation Forest detects anomalies purely based on the fact that anomalies are data points that are few and different. The anomalies isolation is implemented without employing any distance or density measure.

When applying an IsolationForest model, we set contamination = outliers_fraction, that is telling the model what proportion of outliers are present in the data. This is a trial/error metric.
Fit and predict (data) performs outlier detection on data, and returns 1 for normal, -1 for the anomaly.
Finally, we visualize anomalies with the Time Series view.
Let’s do it step by step. First, visualize the time series data:



As you can see, the algorithm did a pretty good job in identifying our planted anomalies, but it also labeled a few points at the start as “outlier”. This is due to two reasons:

At the start, the algorithm is pretty naive to be able to comprehend what qualifies as an anomaly. The more data it gets, the more variance it’s able to see, and it adjusts itself.
If you see many true negatives, that means your contamination parameter is too high Conversely, if you don’t see the red dots where they should be, the contamination parameter is set too low.
Pros

The biggest advantage of this technique is you can introduce as many random variables or features as you like to make more sophisticated models.

Cons

The weakness is that a growing number of features can start to impact your computational performance fairly quickly. In this case, you should select features carefully.

In [None]:
## oneclass svm and isolation forest
# LOF (Local Outlier Factor): Detects anomalies by evaluating the local density deviation of data points.

We can see, Outliers are easier to isolate, while Inliers are harders to isolate..

Here the algorithm tries to chase down the actuals. Though this might be a good forecast where the error is low but the anomalous behaviour in the actuals cant be identified using this.

This is a problem of using forecasting techniques for anomaly detection.We are trying to capture trends/seasonality in data along with not optimising too much on the error to get an exact replica of actuals(which makes us difficult to find anomalies).

Every metric needs to be validated with parameters fine-tuned so that anomalies are detected when using forecasting for detecting anomalies. Also for metrics with different distribution of data a different approach in identifying anomalies needs to be followed.

One more con is, Isolation forest we detected anomalies for a use case which comprised of multiple metrics at a time and we drilled down to anomalies on individual metrics in them.Whereas using forecasting mechanism we need a separate correlation logic as forecasting is individual for metrics.

Whereas an algorithm like isolation forest separates out anomalous behavior from the data which can be used to generalize to multiple metrics.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.mixture import GaussianMixture
import plotly.graph_objects as go
import plotly.subplots as sp
from sklearn.ensemble import IsolationForest
from plotly.subplots import make_subplots
from plot_anomaly import multivariate_anomaly_plot

# Load your data (replace this with your actual data loading step)
# For demonstration, let's generate sample data similar to your distribution
data = pd.read_csv("times_series_data_no_labels.csv" ,
    index_col='datetime',
    parse_dates=['datetime']
    )

data.describe()

## Raw input

In [None]:
iso_forest = IsolationForest(contamination=0.005, random_state=42)

# Fit the model
data['anomaly'] = iso_forest.fit_predict(data[['data_0', 'data_1']])
data.loc[:, 'is_anomaly'] = data['anomaly'].apply(lambda x: True if x == -1 else False)

multivariate_anomaly_plot(data)

## Isolation forest - Feature engineering

, n_estimators=200, max_samples=0.7

In [None]:
data = pd.read_csv("times_series_data_no_labels.csv" ,
    index_col='datetime',
    parse_dates=['datetime']
    )

data.describe()

In [None]:
from sklearn.preprocessing import StandardScaler

window = 288
data['hour'] = data.index.hour
data['minute'] = data.index.minute
data['timestamp'] = data['hour'] * 60 + data['minute']
data['lag1_sensor1'] = data['data_0'].shift(1)
data['lag1_sensor2'] = data['data_1'].shift(1)

data['lag2_sensor1'] = data['data_0'].shift(2)
data['lag2_sensor2'] = data['data_1'].shift(2)
data['rolling_mean_sensor1'] = data['data_0'].rolling(window=window).mean()
data['rolling_std_sensor1'] = data['data_0'].rolling(window=window).std()
data['rolling_mean_sensor2'] = data['data_1'].rolling(window=window).mean()
data['rolling_std_sensor2'] = data['data_1'].rolling(window=window).std()

def night_time_indicator(dt):
    if 23 <= dt.hour or dt.hour < 4:
        return 0
    else:
        return 1
    

def ramp_up_down_time_indicator(dt):
    if 4 <= dt.hour or dt.hour < 5:
        return 3
    elif 22 <= dt.hour or dt.hour < 23 :
        return 2

# Apply the function to the index and create a new column
data['daytime_indicator'] = data.index.map(night_time_indicator)
data['daytime_indicator'] = data.index.map(ramp_up_down_time_indicator)

# Drop NaN values
data.dropna(inplace=True)

# Fit Isolation Forest
features = ['data_0', 
            'data_1', 
            'lag1_sensor1', 'lag1_sensor2', 
            'lag2_sensor1', 'lag2_sensor2', 
            # "hour", "minute",
            # "timestamp",
            "daytime_indicator",
            # 'rolling_mean_sensor1', 'rolling_std_sensor1', 'rolling_mean_sensor2', 'rolling_std_sensor2'
            ]

scaler = StandardScaler()
np_scaled = scaler.fit_transform(data[features])
data_scaled = pd.DataFrame(np_scaled)

clf = IsolationForest(contamination=0.005, random_state=42)
clf.fit(data_scaled)
data['anomaly'] = clf.predict(data_scaled)

# -1 for anomalies, 1 for normal

data.loc[:, 'is_anomaly'] = data['anomaly'].apply(lambda x: True if x == -1 else False)

multivariate_anomaly_plot(data)

## Another implementation of isolation forest

In [None]:
data = pd.read_csv("times_series_data_no_labels.csv" ,
    index_col='datetime',
    parse_dates=['datetime']
    )

data.describe()

In [None]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
np_scaled = scaler.fit_transform(data)

data_scaled = pd.DataFrame(np_scaled)

# Isolation forest 
outliers_fraction = 0.005
ifo = IsolationForest(contamination = outliers_fraction)

ifo.fit(data_scaled)
data['anomaly'] = ifo.predict(data_scaled)


data.loc[:, 'is_anomaly'] = data['anomaly'].apply(lambda x: True if x == -1 else False)

multivariate_anomaly_plot(data)

Detection using Forecasting
Anomaly detection using Forecasting is based on an approach that several points from the past generate a forecast of the next point with the addition of some random variable, which is usually white noise. 

As you can imagine, forecasted points in the future will generate new points and so on. Its obvious effect on the forecast horizon – the signal gets smoother.

The difficult part of using this method is that you should select the number of differences, number of autoregressions, and forecast error coefficients.

Each time you work with a new signal, you should build a new forecasting model.

Another obstacle is that your signal should be stationary after differencing. In simple words, it means your signal shouldn’t be dependent on time, which is a significant constraint.

We can utilize different forecasting methods such as Moving Averages, Autoregressive approach, and ARIMA with its different variants. The procedure for detecting anomalies with ARIMA is:

Predict the new point from past datums and find the difference in magnitude with those in the training data.
Choose a threshold and identify anomalies based on that difference threshold. That’s it!
To test this technique, we’re gonna use a popular module in time series called fbprophet. This module specifically caters to stationarity and seasonality, and can be tuned with some hyper-parameters.

Pros

This algorithm nicely handles different seasonality parameters like monthly or yearly, and it has native support for all time series metrics. 

If you look closely, this algorithm can handle edge cases well as compared to the Isolation Forest algorithm.

Cons

Since this technique is based on forecasting, it will struggle in limited data scenarios. The quality of prediction in limited data will be lower, and so will the accuracy of anomaly detection.

