$ git clone https://github.com/Stifo/mods2.git
$ cd mods2
$ export MODS2_HOME=$(pwd)
Create empty certs directory and update permissions.
cd ${MODS2_HOME}
mkdir nginx/certs
chmod 700 certs
Set vm.max_map_count to at least 262144 (read more).
sysctl -w vm.max_map_count=262144
Create users in nginx in order to gain access to the proxied services through the proxy server.
$ sudo apt-get install apache2-utils
$ cd ${MODS2_HOME}
$ htpasswd ./nginx/config/.htpasswd [username]
The following command with start the MODS2 stack with all the required services except the log producer. Log producer need to be deployed separately on the log server side. See Step 4 how to deploy log producer.
$ cd ${MODS2_HOME}
$ docker-compose up -d
$ cd ${MODS2_HOME}
$ docker-compose ps
Name Command State Ports
---------------------------------------------------------------------------------------------
elastic /bin/tini -- /usr/local/bi ... Up 127.0.0.1:9200->9200/tcp, 9300/tcp
kafka start-kafka.sh Up 0.0.0.0:9092->9092/tcp
kibana /bin/tini -- /usr/local/bi ... Up 5601/tcp
mods2 python manage.py runserver ... Up
mods2_db docker-entrypoint.sh postgres Up 5432/tcp
reverse /docker-entrypoint.sh ngin ... Up 0.0.0.0:443->443/tcp, 0.0.0.0:80->80/tcp
zookeeper /bin/sh -c /usr/sbin/sshd ... Up 2181/tcp, 22/tcp, 2888/tcp, 3888/tcp
You should mark down the public IP address of the machine, where you have just deployed MODS2. You will need it later to specify it for the log producer in Step 4.
Log producer runs on the log server side and produces log messages to Kafka. Log producer depends on Kafka. Prior the log producer execution, Kafka should be up and running. Kafka is deployed in Step 3. Log producer code can be found in the $MODS2_HOME/tools/logserver
directory.
$ export MODS2_LOGSERVER_IP=XXX.XXX.XXX.XXX
$ scp -r $MODS2_HOME/tools/logserver ${MODS2_LOGSERVER_IP}:~/
$ ssh ${MODS2_LOGSERVER_IP}
$ cd ~/logserver
$ export MODS2_LOGSERVER=$(pwd)
Follow the log producer documentation on how to deploy the log producer code. After the log producer is deployed, it feeds Kafka with zeek/bro log messages from all the monitored files. These log messages appear in the mods
Kafka channel.
Now it's time to deploy log aggregator on the MODS2 Stack side. Log aggregator consumes all the log messages from the 'mods' Kafka channel and aggregates them on 10m interval. Return to the MODS2 Stack shell, navigate to ${MODS2_HOME}/tools/kafka-services
directory and execute the aggregator.
$ cd ${MODS2_HOME}/tools/kafka-services
$ ./consumer.sh
$ tail -f logs/consumer.log
2021-05-06 14:18:19,944 - __main__ - INFO - messages processed: 352k
2021-05-06 14:18:38,907 - __main__ - INFO - messages processed: 353k
2021-05-06 14:18:57,930 - __main__ - INFO - messages processed: 354k
2021-05-06 14:19:21,962 - __main__ - INFO - messages processed: 355k
Start jupyter-lab in ${MODS2}/tools/kafka-services/mods_models
directory and execute train-online.ipynb
notebook.
Open Kibana's index patterns page in the browser. You have to use username and password created earlier in Step 2.2 in order to access this page. Click on + Create index pattern' and create new index pattern named mods-10m*
. Create new dashboard and add lenses to it. Follow Kibana documentation. Share the created dashboard as an EMBED CODE
of iframe and put this code in ${MODS2_HOME}/mods2/live_monitor/templates/live_monitor_10m.html
template.
After all these steps are done correctly, you should see current status and prediction in Live monitor window. The predictions will appear after 10m X model steps.
- Kafka is not yet hidden behind the nginx proxy. It has publicly exposed port :9092, thus it requires protection.
- There is just a single Kafka node running yet.
- There is only a single instance of log producer supported yet. Runing additional instance of the log producer will cause log duplicity, which is undesirable.
$ docker run -it --rm edenhill/kafkacat:1.7.0-PRE1 kafkacat -C -b XXX.XXX.XXX.XXX:9093 -t mods-agg-10m