# Anomsmith: Introduction and Basic Anomaly Detection

This notebook provides an introduction to the Anomsmith library and demonstrates basic anomaly detection workflows.

## What is Anomsmith?

Anomsmith is a strict 4-layer architecture for anomaly detection with hard boundaries between layers:
- **Layer 1**: Data and Representations (SeriesView, PanelView, etc.)
- **Layer 2**: Algorithm Interfaces (BaseScorer, BaseDetector)
- **Layer 3**: Task Orchestration (DetectTask, helpers)
- **Layer 4**: Public API (detect_anomalies, score_anomalies, etc.)

## Key Features

- Multiple anomaly detection algorithms (statistical, ML-based, PCA, change point)
- Flexible thresholding rules (absolute, quantile-based)
- Evaluation metrics and backtesting utilities
- Clean, modular architecture


In [None]:
# Import necessary libraries
import numpy as np
import pandas as pd
from plotsmith import plot_timeseries

from anomsmith import detect_anomalies, ThresholdRule
from anomsmith.primitives.scorers.robust_zscore import RobustZScoreScorer

np.random.seed(42)


## Creating Synthetic Data

Let's create a synthetic time series with known anomalies to demonstrate detection capabilities.


In [None]:
def create_synthetic_series(n: int = 200, seed: int = 42) -> pd.Series:
    """Create synthetic time series with anomalies.
    
    Args:
        n: Length of series
        seed: Random seed for reproducibility
        
    Returns:
        pandas Series with injected anomalies
    """
    np.random.seed(seed)
    # Base series: random walk with trend
    trend = np.linspace(0, 2, n)
    noise = np.random.randn(n) * 0.5
    y = trend + noise
    
    # Inject spike anomaly at index 50
    y[50] += 5.0
    
    # Inject level shift starting at index 100
    y[100:] += 3.0
    
    index = pd.date_range("2020-01-01", periods=n, freq="D")
    return pd.Series(y, index=index)

# Create the synthetic series
y = create_synthetic_series(n=200)
print(f"Created time series with {len(y)} points")
print(f"Date range: {y.index[0]} to {y.index[-1]}")
print(f"\nFirst few values:\n{y.head()}")


In [None]:
# Visualize the synthetic data
fig, ax = plot_timeseries(
    y,
    title='Synthetic Time Series with Injected Anomalies',
    xlabel='Date',
    ylabel='Value'
)
# Add vertical lines for injected anomalies
ax.axvline(y.index[50], color='r', linestyle='--', alpha=0.5, label='Injected Spike')
ax.axvline(y.index[100], color='orange', linestyle='--', alpha=0.5, label='Injected Level Shift')
ax.legend()
plt.show()


## Basic Anomaly Detection

Now let's use RobustZScoreScorer to detect anomalies in our time series.


In [None]:
# Initialize scorer
scorer = RobustZScoreScorer(epsilon=1e-8)
scorer.fit(y.values)

# Define threshold rule (using 95th percentile)
threshold_rule = ThresholdRule(method="quantile", value=0.95, quantile=0.95)

# Detect anomalies
result = detect_anomalies(y, scorer, threshold_rule)

print("Detection Results (first 10 rows):")
print(result.head(10))
print("\nSummary Statistics:")
print(f"Total points: {len(result)}")
print(f"Anomalies detected: {result['flag'].sum()}")
print(f"Anomaly rate: {result['flag'].mean():.2%}")
print(f"\nScore statistics:")
print(result['score'].describe())


In [None]:
# Visualize detection results
import matplotlib.pyplot as plt

# Top plot: Time series with detected anomalies
anomaly_mask = result['flag'] == 1
fig1, ax1 = plot_timeseries(
    y,
    title='Anomaly Detection Results',
    xlabel='Date',
    ylabel='Value'
)
ax1.scatter(y.index[anomaly_mask], y.values[anomaly_mask], 
           color='red', s=100, marker='x', linewidths=2, 
           label=f'Detected Anomalies ({anomaly_mask.sum()})', zorder=5)
ax1.legend()
plt.show()

# Bottom plot: Anomaly scores
fig2, ax2 = plot_timeseries(
    pd.Series(result['score'], index=y.index),
    title='Anomaly Scores Over Time',
    xlabel='Date',
    ylabel='Anomaly Score'
)
threshold_value = np.quantile(result['score'], 0.95)
ax2.axhline(threshold_value, color='r', linestyle='--', linewidth=2, 
           label=f'Threshold ({threshold_value:.2f})')
ax2.fill_between(y.index, threshold_value, result['score'].max(), 
                where=(result['score'] >= threshold_value), 
                alpha=0.3, color='red', label='Anomaly Region')
ax2.legend()
plt.show()


## Comparing Different Threshold Methods

Let's compare absolute and quantile-based thresholding methods.


In [None]:
# Get scores first
scores = scorer.score(y)
score_values = scores.scores

# Quantile-based threshold
quantile_rule = ThresholdRule(method="quantile", value=0.95, quantile=0.95)
result_quantile = detect_anomalies(y, scorer, quantile_rule)

# Absolute threshold (using mean + 2*std as example)
absolute_threshold = np.mean(score_values) + 2 * np.std(score_values)
absolute_rule = ThresholdRule(method="absolute", value=absolute_threshold)
result_absolute = detect_anomalies(y, scorer, absolute_rule)

# Compare results
comparison = pd.DataFrame({
    'Quantile (95th)': [result_quantile['flag'].sum(), result_quantile['flag'].mean()],
    'Absolute (mean+2σ)': [result_absolute['flag'].sum(), result_absolute['flag'].mean()]
}, index=['Number of Anomalies', 'Anomaly Rate'])

print("Threshold Comparison:")
print(comparison)


In [None]:
# Visualize comparison
import matplotlib.pyplot as plt

# Quantile-based
anomaly_mask_q = result_quantile['flag'] == 1
fig1, ax1 = plot_timeseries(
    y,
    title=f'Quantile Threshold (95th percentile) - {result_quantile["flag"].sum()} anomalies',
    xlabel='Date',
    ylabel='Value'
)
ax1.scatter(y.index[anomaly_mask_q], y.values[anomaly_mask_q], 
           color='red', s=100, marker='x', linewidths=2, zorder=5)
plt.show()

# Absolute-based
anomaly_mask_a = result_absolute['flag'] == 1
fig2, ax2 = plot_timeseries(
    y,
    title=f'Absolute Threshold (mean + 2σ) - {result_absolute["flag"].sum()} anomalies',
    xlabel='Date',
    ylabel='Value'
)
ax2.scatter(y.index[anomaly_mask_a], y.values[anomaly_mask_a], 
           color='red', s=100, marker='x', linewidths=2, zorder=5)
plt.show()


## Summary

In this notebook, we've learned:
1. How to create synthetic time series data with known anomalies
2. How to use `RobustZScoreScorer` for anomaly detection
3. How to apply different threshold rules (quantile vs absolute)
4. How to visualize detection results

In the next notebooks, we'll explore:
- Different statistical scorers (ZScore, IQR)
- Machine learning-based detectors
- PCA-based detection
- Change point detection
- Evaluation metrics and backtesting
