Skip to content

GSoC 2016 Proposal: Kafka source in Java (iammehrabalam)

Md Mehrab Alam edited this page Mar 24, 2016 · 4 revisions

Kafka source in Java

Description

Syslog-ng is used to manage log messages and implement centralized logging. It collects all log messages from different source to log server (i.e log aggregator). The log messages can be filtered and processed.

Project benefits

Kafka is a publish-subscribe messaging system that can handle high throughput and is easy to scale horizontally, which makes it a perfect for transporting a huge amount of log data.

Few advantage of this project to Syslog-ng

  • Scalability: Syslog-ng clients send logs to Kafka broker(s) and consumers listening to the different topic(s) doing work independently. when consumer lag or say the rate of producing message increase than simply add more consumers will increase throughput and also more machine without any loss.

  • Persistence: Due to consumer crash or connection error, logs will be available upto retention period. In this case will can recover log message without loss.

    etc

this(not best example) stackoverflow problem can be solved by Kafka. Create topics whose name corresponds hostname or ip. Each server sends logs to the particular topic and each topic is listening by the different consumer.

Common use case of using apache Kafka

  1. Log aggregation
  2. Activity tracking
  3. Real-time analysis of data
  4. etc..

Knowledge Required

  1. Familiarity with Java at a strong user level
  2. Knowledge of Apache Kafka
  3. Familiar with Syslog-ng
  4. Git version control and Github

Goal of the project

The Goal of this project is to write a consumer (Kafka terminology) which can read messages from Kafka broker(s) or cluster without duplicating messages and flexible configuration.

Timeline

Rest of March & April

  • Getting familiar with Syslog-ng
  • Setup Syslog-ng to know how Syslog-ng clients send and receive log messages
  • Discussion with mentor and receive feedback how exactly we will proceed
  • Java Kafka API

May

  • Setup Kafka cluster or broker(s)
  • Start coding
  • Review from mentor

June

  • Incorporate all possible configuration for Kafka Consumer
  • Testing and improving consumer (receiving data and all other situation handle well)
  • Discussion with mentor
  • Integration of Kafka Consumer with Syslog-ng

July

  • Testing
  • Refactor code and bug fixes
  • PreFinal release
  • Get feedback from mentor and discussion

August

  • Review code and remove bugs if any
  • Get review from mentor
  • Documentation and user guide
  • Final release

Availability

I won’t be available from 3th May to 12th May as I’ll be having my college exams. But I will make up for this by working in advance. Before and after I am available and give maximum possible time.

About Me

My name is Md Mehrab Alam, currently studying Computer Engineering(4th year) in Faculty of Engineering, Jamia Millia Islamia, New Delhi, India. I am a developer having technology skills like Python, C, C++, Java etc.

I would like to contribute to Syslog-ng community under GSOC 16. I consider myself a good candidate for this project because I have past experience in working a project having the same use case and use Apache Kafka.

LinkedIn: https://www.linkedin.com/in/iammehrabalam

Github: https://github.com/iammehrabalam

Email: md.mehrab@gmail.com

Phone no: +91 9654835865

References

http://kafka.apache.org/08/uses.html

http://kafka.apache.org/documentation.html

https://kafka.apache.org/090/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html

Clone this wiki locally