<a href="https://colab.research.google.com/github/subhashpolisetti/Clustering-Techniques-and-Embeddings/blob/main/9_Anomaly_Detection_in_Time_Series_Univariate_%26_Multivariate_with_PyOD.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Anomaly Detection in Time Series Data (Univariate and Multivariate)

In this notebook, we demonstrate how to perform anomaly detection on both univariate (single feature) and multivariate (multiple features) time series data using popular machine learning models. We use the **PyOD** library for anomaly detection and visualize the results using `matplotlib`.

### Steps covered:
1. **Univariate Anomaly Detection**:
   - **Isolation Forest (IForest)**: A model designed to identify anomalies in a single-dimensional time series.
   - **Synthetic Data Generation**: We create time series data with seasonal patterns and artificial anomalies inserted.
   - **Anomaly Detection**: The `IForest` model detects anomalies based on a contamination factor, i.e., the expected percentage of outliers.
   - **Visualization**: The anomalies are plotted on top of the original time series data to identify detected outliers.

2. **Multivariate Anomaly Detection**:
   - **K-Nearest Neighbors (KNN)**: A model that detects anomalies by finding data points that are far from their neighbors in a multi-dimensional space.
   - **Synthetic Data Generation**: We generate multivariate data and artificially insert outliers.
   - **Feature Scaling**: Features are scaled using `StandardScaler` for better performance in anomaly detection.
   - **Anomaly Detection**: The `KNN` model identifies anomalies by analyzing distances in the feature space.
   - **Visualization**: The multivariate anomalies are visualized in a 2D feature space (first two features) for easy interpretation.

### Code Walkthrough:
- **`detect_univariate_anomalies`**: Detects anomalies in a univariate time series data using the Isolation Forest algorithm.
- **`detect_multivariate_anomalies`**: Detects anomalies in multivariate data using the K-Nearest Neighbors algorithm.
- **`generate_sample_timeseries`**: Generates synthetic time series data with seasonal patterns and noise, and inserts anomalies at random points.

### Example Results:
- **Univariate Anomaly Detection**: Anomalies are detected and highlighted on the time series plot.
- **Multivariate Anomaly Detection**: Anomalies are detected in a 2D feature space, and normal vs. anomalous points are plotted.

The notebook showcases how both models can be applied to real-world anomaly detection tasks, allowing the detection of unusual patterns or outliers in time series data.

#### Libraries Used:
- **PyOD**: A comprehensive toolkit for detecting outlying objects in multivariate data.
- **Scikit-learn**: For KMeans clustering and scaling.
- **Matplotlib**: For visualizing the data and the detected anomalies.
