Anomalia Machina - Massively Scalable Anomaly Detection with Apache Kafka, Cassandra and Kubernetes

This is the final example code for the demonstration Anomaly Detection pipeline for Instaclustr's Anomalia Machina Blog series:

Instructions

For the design and more detailed instructions see the blogs (above). Here are the basic steps.

To run the Anomaly Detection pipeline you need to have the following configured and running (all on AWS):

Instaclustr Kafka and Cassandra clusters (for Cassandra, no authentication)
connect to the Cassandra cluster using cqlsh, and create the Cassandra keyspace and table (CQL in CassandraClient.java)
Kafka auto topic creation turned on (so you need to run the producer before the consumer, see below)
Kubernetes running in the same region as the Kafka and Cassandra clusters (E.g. On AWS use EKS)
Edit KafkaProperties.java with the Instaclustr Kafka cluster credentials
Edit AnomaliaProperties.jave with the Instaclustr Provisioning API credentials
Either: Configure Kafka and Cassandra cluster firewalls to enable access from Kubernetes (and use public IPs, this assumes you know the IPs of the Kubernetes worker nodes), or set up VPC peering between the Kubernetes cluster and the Instaclustr clusters (and use private IPs)
A local Docker and Kubernetes (On a Mac I was using the Docker community edition which comes with Kubernetes)
A Docker hub account (edit the xxx.sh files with the account name)
An IDE with the code loaded

To deploy and run the application:

Generate executable two jar files, one called consumer.jar from AnomaliaMainConsumer.jar, and one called producer.jar from AnomaliaMainProducer.jar
Start 1 or more Kubernetes worker nodes in AWS (using auto scaling groups)
Deploy Prometheus using the deploy_prometheus.sh script
Deploy the producer using the deploy_producer.sh script
Deploy the consumer using the deploy_consumer.sh script
Look at the prometheus metrics in a broswer (you'll need to copy a pubic IP address of one of the Kubernetes worker nodes from the AWS console into your browser), e.g. 1.2.3.4:30123
The producer load and consumers can be scaled by increasing the number of Kubernetes worker nodes and increasing the number of pods for producers and consumers. Some tuning of the parameters in AnomaliaProperties.java will be required to ensure optimal throughput.

Note that the Prometheus instrumentation is present and used in the final Kubernetes production environment. However, the OpenTracing/Jaeger tracing instrumentation is present but unused in the Kubernetes environment (you would have to run a Jaeger Operator to use it).

Instaclustr Open Source Project Status: SAMPLE

for further information see: https://www.instaclustr.com/support/documentation/announcements/instaclustr-open-source-project-status/

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
AnomaliaConsumer.java		AnomaliaConsumer.java
AnomaliaMainConsumer.java		AnomaliaMainConsumer.java
AnomaliaMainProducer.java		AnomaliaMainProducer.java
AnomaliaProducerThreaded.java		AnomaliaProducerThreaded.java
AnomaliaProperties.java		AnomaliaProperties.java
CUSUMChangeDetector.java		CUSUMChangeDetector.java
CassandraClient.java		CassandraClient.java
ChangeDetector.java		ChangeDetector.java
CheckEvent.java		CheckEvent.java
Dockerfile.consumer		Dockerfile.consumer
Dockerfile.producer		Dockerfile.producer
GlobalProperties.java		GlobalProperties.java
KafkaProperties.java		KafkaProperties.java
LICENSE		LICENSE
MyHeadersMapExtractAdapter.java		MyHeadersMapExtractAdapter.java
ProvisionAPI.java		ProvisionAPI.java
README.md		README.md
Result.java		Result.java
delete.sh		delete.sh
deploy_consumer.sh		deploy_consumer.sh
deploy_producer.sh		deploy_producer.sh
deploy_prometheus.sh		deploy_prometheus.sh
k8_consumer.yaml		k8_consumer.yaml
k8_producer.yaml		k8_producer.yaml
k8_prometheusOperator.yaml		k8_prometheusOperator.yaml
pom.xml		pom.xml
prometheusBundle.yaml		prometheusBundle.yaml

License

instaclustr/AnomaliaMachina

Folders and files

Latest commit

History

Repository files navigation

Anomalia Machina - Massively Scalable Anomaly Detection with Apache Kafka, Cassandra and Kubernetes

Instructions

Instaclustr Open Source Project Status: SAMPLE

About

Topics

Resources

License

Stars

Watchers

Forks

Languages