A MapReduce framework for network anomaly detection
- Hadoop cluster with Hadoop Streaming installed
- Numpy installed on all hadoop nodes
Note: to avoid the burden of installing Hadoop, you can also try hashdoop with the Matatabi docker image.
The analysis of traffic traces with Hashdoop consists of four main steps:
- Convert traffic trace to textual format
- Configure Hashdoop
- Hash the trace
- Detect anomalies
Generate text files from a pcap trace
Assuming the pcap trace 200704121400.dump.gz is in the ~/mawi/ directory. Convert the pcap file to a text file using the following command:
ipsumdump -tsSdDlpF -r ~/mawi/200704121400.dump.gz > ~/mawi/200704121400.ipsum
Upload trace on HDFS
The destination directory should be the same as the tracesHdfsPath variable in hashdoop.conf.
hadoop fs -mkdir -p /user/hashdoop/data/ hadoop fs -put ~/mawi/200704121400.ipsum /user/hashdoop/data/
hashdoop.conf file is set by default for the trace and directories
used in this readme. Make sure variables in this file meet your needs.
tracesHdfsPath: HDFS directory where traffic traces are located
sketchesHdfsPath: HDFS directory where hashed traffic will be stored
streamingLib: jar file of your hadoop streaming Note that trace names are assumed to be like the ones in the MAWI archive.
Set the “hashSize” parameter in hashdoop.conf.
This parameter controls the number of sub-traces created with one hash key. Hashdoop uses two
hash keys (i.e. the source and destination address), so it generated
Execute the (MapReduce) hashing code with the runHashing.py script:
Set the detection threshold and the output path in the configuration file (hashdoop.conf), then run:
Set the detection threshold, time bin and the output path in the configuration file (hashdoop.conf), then run: