# Anomaly Detection with sktime

This notebook demonstrates how to detect point outliers in time series data using sktime's anomaly detection capabilities.

In [None]:
import pathlib

import matplotlib.pyplot as plt
import pandas as pd

## What is Anomaly Detection?

Anomaly detection (also called outlier detection) identifies unusual data points in a time series. These can be:

1. **Point outliers**: Individual data points that are unusual compared to the whole time series (global) or neighboring points (local)
2. **Subsequence outliers**: Sequences of points that are unusual when compared to others

Anomaly detection is useful for:
- Removing unrealistic data points
- Finding points or areas of interest
- Quality control and monitoring

## Loading Example Data

We'll use the Yahoo dataset which contains synthetic labeled anomalies. In practice, outlier detection is usually an unsupervised learning task, so labels are not typically provided.

In [None]:
data_root = pathlib.Path("../sktime/datasets/data/")
df = pd.read_csv(data_root / "yahoo/yahoo.csv")
df.head()

## Visualizing the Data

Let's plot the time series with the labeled anomalies highlighted.

In [None]:
fig, ax = plt.subplots(figsize=(12, 4))
ax.plot(df["data"], label="Normal", alpha=0.7)

mask = df["label"] == 1.0
ax.scatter(
    df.loc[mask].index, 
    df.loc[mask, "data"], 
    label="Anomalous", 
    color="red",
    s=50
)

ax.legend()
ax.set_ylabel("Value")
ax.set_xlabel("Time")
ax.set_title("Time Series with Labeled Anomalies")
plt.tight_layout()
plt.show()

## Detecting Anomalies with STRAY

sktime provides several algorithms for anomaly detection. Here we'll use STRAY (Search TRace AnomalY), which is effective for detecting point outliers.

In [None]:
from sktime.detection.stray import STRAY

# Initialize and fit the model
model = STRAY()
model.fit(df["data"])

# Predict anomalies (True = anomalous, False = normal)
y_pred = model.transform(df["data"])

# Count detected anomalies
n_anomalies = y_pred.sum()
print(f"Number of anomalies detected: {n_anomalies}")

## Visualizing Detected Anomalies

Let's plot the predicted anomalies alongside the original data.

In [None]:
fig, ax = plt.subplots(figsize=(12, 4))
ax.plot(df["data"], label="Normal", alpha=0.7)

# Plot detected anomalies
anomaly_mask = y_pred
ax.scatter(
    df.loc[anomaly_mask].index,
    df.loc[anomaly_mask, "data"],
    label="Detected Anomalies",
    color="red",
    s=50,
    marker="x"
)

ax.legend()
ax.set_ylabel("Value")
ax.set_xlabel("Time")
ax.set_title("Detected Anomalies using STRAY")
plt.tight_layout()
plt.show()

## Summary

In this notebook, we demonstrated:
- What anomaly detection is and why it's useful
- How to use sktime's STRAY algorithm for point outlier detection
- How to visualize detected anomalies

For more advanced anomaly detection techniques and other algorithms available in sktime, refer to the API documentation.