GitHub - oskar-flores/anomally_detect

Basic implementation is based on the chapter 5 (Anomaly Detection in Network Traffic with K-means clustering) of the book Advanced Analytics with Spark.

Algorithms:

K-means

Categorical features are transformed into numerical features using one-hot encoder. Afterwards, all features are normalized.

Metrics used:

Sum of distances between points and their centroids

Anomaly detection is done as follow:

Find the maximal value of each cluster, those will be the thresholds
For a new point, calculate its score (distance), if it is more than the threshold of its cluster, this is an anomaly

Datasource: https://archive.ics.uci.edu/ml/datasets/KDD+Cup+1999+Data Test set: http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html (corrected.gz)

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
core		core
detector		detector
gradle/wrapper		gradle/wrapper
scripts		scripts
.gitignore		.gitignore
README.md		README.md
build.gradle		build.gradle
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

oskar-flores/anomally_detect

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages