Skip to content
Massively Scalable Anomaly Detection with Apache Kafka, Cassandra and Kubernetes - final code for Instaclustr's Anomalia Machina Blog series
Branch: master
Clone or download
Latest commit 6ea72af May 22, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
AnomaliaConsumer.java Add files via upload May 21, 2019
AnomaliaMainConsumer.java Add files via upload May 21, 2019
AnomaliaMainProducer.java Add files via upload May 21, 2019
AnomaliaProducerThreaded.java Add files via upload May 21, 2019
AnomaliaProperties.java Add files via upload May 21, 2019
CUSUMChangeDetector.java Add files via upload May 21, 2019
CassandraClient.java Add files via upload May 21, 2019
ChangeDetector.java Add files via upload May 21, 2019
CheckEvent.java Add files via upload May 21, 2019
Dockerfile.consumer Add files via upload May 22, 2019
Dockerfile.producer Add files via upload May 22, 2019
GlobalProperties.java Add files via upload May 21, 2019
KafkaProperties.java Add files via upload May 21, 2019
LICENSE Add files via upload May 22, 2019
MyHeadersMapExtractAdapter.java Add files via upload May 21, 2019
ProvisionAPI.java
README.md Update README.md May 22, 2019
Result.java Add files via upload May 21, 2019
delete.sh
deploy_consumer.sh Add files via upload May 22, 2019
deploy_producer.sh Add files via upload May 22, 2019
deploy_prometheus.sh Add files via upload May 22, 2019
k8_consumer.yaml Add files via upload May 22, 2019
k8_producer.yaml Add files via upload May 22, 2019
k8_prometheusOperator.yaml Add files via upload May 22, 2019
pom.xml Add files via upload May 21, 2019
prometheusBundle.yaml Add files via upload May 22, 2019

README.md

Anomalia Machina - Massively Scalable Anomaly Detection with Apache Kafka, Cassandra and Kubernetes

This is the final example code for the demonstration Anomaly Detection pipeline for Instaclustr's Anomalia Machina Blog series:

Instructions

For the design and more detailed instructions see the blogs (above). Here are the basic steps.

To run the Anomaly Detection pipeline you need to have the following configured and running (all on AWS):

  • Instaclustr Kafka and Cassandra clusters (for Cassandra, no authentication)
  • connect to the Cassandra cluster using cqlsh, and create the Cassandra keyspace and table (CQL in CassandraClient.java)
  • Kafka auto topic creation turned on (so you need to run the producer before the consumer, see below)
  • Kubernetes running in the same region as the Kafka and Cassandra clusters (E.g. On AWS use EKS)
  • Edit KafkaProperties.java with the Instaclustr Kafka cluster credentials
  • Edit AnomaliaProperties.jave with the Instaclustr Provisioning API credentials
  • Either: Configure Kafka and Cassandra cluster firewalls to enable access from Kubernetes (and use public IPs, this assumes you know the IPs of the Kubernetes worker nodes), or set up VPC peering between the Kubernetes cluster and the Instaclustr clusters (and use private IPs)
  • A local Docker and Kubernetes (On a Mac I was using the Docker community edition which comes with Kubernetes)
  • A Docker hub account (edit the xxx.sh files with the account name)
  • An IDE with the code loaded

To deploy and run the application:

  • Generate executable two jar files, one called consumer.jar from AnomaliaMainConsumer.jar, and one called producer.jar from AnomaliaMainProducer.jar
  • Start 1 or more Kubernetes worker nodes in AWS (using auto scaling groups)
  • Deploy Prometheus using the deploy_prometheus.sh script
  • Deploy the producer using the deploy_producer.sh script
  • Deploy the consumer using the deploy_consumer.sh script
  • Look at the prometheus metrics in a broswer (you'll need to copy a pubic IP address of one of the Kubernetes worker nodes from the AWS console into your browser), e.g. 1.2.3.4:30123
  • The producer load and consumers can be scaled by increasing the number of Kubernetes worker nodes and increasing the number of pods for producers and consumers. Some tuning of the parameters in AnomaliaProperties.java will be required to ensure optimal throughput.

Note that the Prometheus instrumentation is present and used in the final Kubernetes production environment. However, the OpenTracing/Jaeger tracing instrumentation is present but unused in the Kubernetes environment (you would have to run a Jaeger Operator to use it).

Instaclustr Open Source Project Status: SAMPLE

You can’t perform that action at this time.