Ipython notebook that illustrates effectiveness of machine learning algorithms in anomaly detection of netflow data (inbound/outbound DDoS, etc...)
Python Shell
Latest commit e5354a1 Sep 29, 2015 @eraclitux Add notes on inaccuracies
Failed to load latest commit information.
data Update data Jun 23, 2015
.gitignore Init Mar 12, 2015
LICENSE Initial commit Mar 12, 2015
README.rst Add notes on inaccuracies Sep 29, 2015
machinelearning-netflow.ipynb Update ipython notebook format Jun 23, 2015
mangle.py Add confusion matrix Mar 18, 2015


Use of machine learning for anomaly detection in netflow data

This notebook can be viewed on github.

A readable version of this ipython notebook can also be found here.


I'm not a data scientist and I'm sure that this process contains errors and inaccuracies. One of I'm aware of is that I've used euclidean distance calculation on heterogeneous features. This is formally incorrect even if classification results are consistent.

If you find other errors feels free to report them with isses or pull requests.

I've no longer access to any netflow data collector. I'd like to develop a service (and open source it ;-)) that applies ml alghoritms to this data to automatically spot anomalies. If someone is interested and has a collector with nfdump installed, which I can have ssh access to, please contact me!