CDL-Outlier-Drift-Detection-Practicum

Team Members

Ishu Kalra, Kiran Jyothi Sheena, Chunxue(April) Yang, Andrew Yaholkovsky, Yifei Wang

Objective

The objective of this project was to develop efficient modules for outlier and drift detection in streaming data. Three distinct modules were created to accomplish this objective.

The first module focuses on Concept Drift Detection, which arises when the relationship between dependent and independent variables undergoes changes over time. Multiple algorithms were developed within this module, utilizing both sliding and fixed windows. Importantly, these algorithms operate independently of the true label associated with the incoming data.
The second module addresses Data Drift Detection, which occurs when the distribution of the dependent variable changes over time. To identify data drift, statistical tests such as KL divergence, JS convergence, or PSI are employed. A module was developed to implement an algorithm that detects data drift using a sliding window approach. Additionally, an algorithm was devised to optimize the window size for better detection accuracy.
The third module focuses on Outlier Detection, recognizing that streaming data often contains features that are susceptible to outliers. To address this, a module was designed to detect outliers in the data. This module utilizes the algorithms, iForestASD and Half Space Tree, which have proven to be effective in outlier detection scenarios.

Step into any of the following folders for more details:

Project Structure

│ 
├── Concept Drift Detection
│   ├── Readme.md
│   ├── Semi-supervised_Concept_Drift.ipynb
│   ├── drift_injection
│   │   ├── CreateGradualDriftDatasets.py
│   │   ├── CreateSuddenDriftDatasets.py
│   │   ├── Gradual30high200.csv
│   │   ├── InjectGradualDrift.py
│   │   ├── InjectSuddenDrift.py
│   │   ├── MotionSenseData.csv
│   │   ├── PythonNotebookReference
│   │   │   ├── ArtificialDriftInjection.ipynb
│   │   │   ├── ExperimentingWithMultiflow.ipynb
│   │   │   └── TestingDriftOnDatasets.ipynb
│   │   └── Sudden30high200.csv
│   ├── images
│   │   ├── DriftInjectionExample.png
│   │   ├── GradualDriftInjection.png
│   │   └── InjectedDriftbyExerciseType.png
│   ├── reference
│   │   └── Concept_Drift_no_labels.pdf
│   ├── src
│   │   ├── Dynamic_window.py
│   │   ├── Ensemble_reference_window.py
│   │   ├── Fixed_reference_window.py
│   │   ├── Moving_reference_window.py
│   │   └── __init__.py
│   └── test_dataset
│       ├── NoDrift.csv
│       ├── sub_1_combined_data.csv
│       ├── sub_2_combined_data.csv
│       └── sudden30high200.csv
├── Data Drift Detection
│   ├── Examples
│   │   ├── Example_1.ipynb
│   │   ├── Example_2.ipynb
│   │   └── Window_Variation_Results.ipynb
│   ├── README.md
│   ├── config
│   │   └── default-config.yaml
│   ├── datasets
│   │   ├── N=10000_Subject_1_10_sudden_drift.csv
│   │   ├── N=1000_Subject_2_11_distributed_drift_2000_1000_50.csv
│   │   ├── N=100_Subject_2_11_distributed_drift_2000_1000_50.csv
│   │   ├── N=16000_Subject_1_10_sudden_drift.csv
│   │   ├── N=20000_Subject_1_10_sudden_drift.csv
│   │   ├── N=2000_Subject_2_11_distributed_drift_2000_1000_50.csv
│   │   ├── N=200_Subject_2_11_distributed_drift_2000_1000_50.csv
│   │   ├── N=4000_Subject_1_10_sudden_drift.csv
│   │   ├── N=500_Subject_2_11_distributed_drift_2000_1000_50.csv
│   │   ├── N=8000_Subject_1_10_sudden_drift.csv
│   │   ├── Subject_10_random.csv
│   │   ├── Subject_1_random.csv
│   │   ├── Subject_2_11_distributed_lambda=2000_size=1000_50.csv
│   │   ├── Subject_2_11_distributed_lambda=2000_size=1000_50_indicators.csv
│   │   └── sub_2.csv
│   ├── main.py
│   └── src
│       ├── __init__.py
│       ├── data_drift_detection.py
│       └── window_optimization.py
├── Outlier Detection
│   ├── README.md
│   ├── datasets
│   │   ├── 202301-divvy-tripdata.csv
│   │   ├── 202302-divvy-tripdata.csv
│   │   ├── 202303-divvy-tripdata.csv
│   │   ├── motion_sense_combined_data
│   │   │   ├── sub_10_combined_data.csv
│   │   │   ├── sub_11_combined_data.csv
│   │   │   ├── sub_12_combined_data.csv
│   │   │   ├── sub_13_combined_data.csv
│   │   │   ├── sub_14_combined_data.csv
│   │   │   ├── sub_15_combined_data.csv
│   │   │   ├── sub_16_combined_data.csv
│   │   │   ├── sub_17_combined_data.csv
│   │   │   ├── sub_18_combined_data.csv
│   │   │   ├── sub_19_combined_data.csv
│   │   │   ├── sub_1_combined_data.csv
│   │   │   ├── sub_20_combined_data.csv
│   │   │   ├── sub_21_combined_data.csv
│   │   │   ├── sub_22_combined_data.csv
│   │   │   ├── sub_23_combined_data.csv
│   │   │   ├── sub_24_combined_data.csv
│   │   │   ├── sub_2_combined_data.csv
│   │   │   ├── sub_3_combined_data.csv
│   │   │   ├── sub_4_combined_data.csv
│   │   │   ├── sub_5_combined_data.csv
│   │   │   ├── sub_6_combined_data.csv
│   │   │   ├── sub_7_combined_data.csv
│   │   │   ├── sub_8_combined_data.csv
│   │   │   └── sub_9_combined_data.csv
│   │   ├── revised_sub20_cleandata_forStream.csv
│   │   ├── revised_sub20_data_2ndV.csv
│   │   └── revised_sub20_data_Ishu_modification.csv
│   ├── notebooks
│   │   ├── Dimensionality_Reduction.ipynb
│   │   ├── DivvyBike-Outlier-Detection.ipynb
│   │   ├── ExploringDivvyBikeDataset.ipynb
│   │   ├── Hierarchical_clustering.ipynb
│   │   ├── MotionSenseDataset-HSTree&iForestASD.ipynb
│   │   └── MotionSense_LOF_confusion_matrix_incorporated.ipynb
│   └── src
│       ├── DivvyBike-Outlier-Detection.py
│       ├── Hierarchical_clustering.py
│       ├── MotionSense_HSTree.py
│       ├── MotionSense_LOF_confusion_matrix_incorporated.py
│       └── source
│           ├── functions.py
│           └── iforestasd_scikitmultiflow.py
└── README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CDL-Outlier-Drift-Detection-Practicum

Team Members

Objective

Project Structure

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Concept Drift Detection		Concept Drift Detection
Data Drift Detection		Data Drift Detection
Outlier Detection		Outlier Detection
README.md		README.md

mcelz/CDL-Data-Drift-and-Outlier-Detection

Folders and files

Latest commit

History

Repository files navigation

CDL-Outlier-Drift-Detection-Practicum

Team Members

Objective

Project Structure

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages