/
ip_packet_robust_zscore_outlier_tutorial.txt
51 lines (39 loc) · 1.38 KB
/
ip_packet_robust_zscore_outlier_tutorial.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
This tutorial is outlier detection with robust z score and median average deviation.
Network ip packet data is used as the testcase
Dependent script
================
Checkout the project visitante. Take the script util.py and sampler.py from the project and placeit
in ../lib directory with respect the directory containing store_order.py
Deployment
==========
Please refer to building_uber_jar.txt
1. Create products data
=======================
./ip_packet.py <num_of_packets> > <packet_data_file>
2. Export input to HDFS
=======================
hadoop fs -put <packet_data_file> /user/pranab/nuam/input
3. Export metadata to HDFS
==========================
hadoop fs -put ip_pocket.json /user/pranab/meta/nuam
4. Run mediation map reduce
============================
./etl.sh median
5. Edit properties for median absolute divergence (mad)
=======================================================
op.type=mad
med.file.path=/user/pranab/pack/med/output/part-r-00000
6. Run median absolute divergence map reduce
============================================
./etl.sh medAvDev
7. Find outliers
================
Place JSNOn schema file in HDFS
hadoop fs -put ipPacket.json /user/pranab/meta/nuam/
Run map reduce
./etl.sh validate
The output will be in the HDFS file specified through the config param
invalid.data.file.path
8. Configuration
================
Sample configuration is in etl.properties