Skip to content

GSoC 2018 Proposal: ElasticSearch destination: native(C) REST API (definit3)

Vivek Raj edited this page Apr 5, 2018 · 3 revisions

Introduction

ElasticSearch destination: native(C) REST API

Syslog-ng is a log-management tool that can collect logs from wide range of sources and also allows us to parse, classify and correlate logs and deliver them to various destinations.

ElasticSearch is a distributed JSON-based search and analytics engine. ElasticSearch destination is already implemented in Java in syslog-ng. This project will implement REST ElasticSearch client library in C and use it to implement new ElasticSearch destination driver and hence removing dependencies on Java.

Benefits to the Community

Elasticsearch is gaining momentum as the ultimate destination for log messages. There are two major reasons for this:

  • You can store arbitrary name-value pairs coming from structured logging or message parsing.
  • You can use Kibana as a search and visualization interface.

Syslog-ng has implementation of ElasticSearch destination in Java. Most of the syslog-ng code is written in C, implementing ElasticSearch destination driver in C will help it in using resource more efficiently and removing dependencies on Java.

Why syslog-ng?

Our seniors introduced us to this open source community few months back and I absolutely loved it. That’s why I want to spend my summer working on an open source project and especially syslog-ng because this community was more than welcome to help newcomers like me to get started with the detailed guidance and now it’s my turn to work for the community.

Originally, I planned to work only on crypto-parser. I started exploring syslog-ng extensively due to this and then I came across ElasticSearch. Diving a bit into ElasticSearch made me realise how useful a tool is ElasticSearch for storage and analysis of the logs. It would be my pleasure to work for this particular project in summer.

Goals of the Project

The goal of the project is to create a completely functional crypto-parser module that will include

  • As there is no official/community-contributed ElasticSearch client in C, the first goal is to implement a REST ElasticSearch client library in C .
  • Using the created ElasticSearch client, implement a new ElasticSearch destination driver written in C.
  • Debian Packaging of the new ElasticSearch destination driver.

Implementation

The primary or most-commonly-used HTTP verbs (or methods, as they are properly called) are POST, GET, PUT, PATCH, and DELETE. These correspond to create, read, update, and delete (or CRUD) operations, respectively.

Elasticsearch provides a very comprehensive and powerful REST API that you can use to interact with your cluster. In ElasticSearch REST API our application accesses ElasticSearch cluster using HTTP. Among the few things that can be done with the API are as follows:

  • Check your cluster, node, and index health, status, and statistics
  • Administer your cluster, node, and index data and metadata
  • Perform CRUD (Create, Read, Update, and Delete) and search operations against your indexes
  • Execute advanced search operations such as paging, sorting, filtering, scripting, aggregations, and many others

ElasticSearch APIs:

  • To check cluster health we can use cat API of ElasticSearch.
  • ElasticSearch provides Document APIs , with which we can perform index, get, delete and update operations.
  • bulk API : The bulk API makes it possible to perform many index/delete operations in a single API call. This can greatly increase the indexing speed.

Depending upon the functionalities that we need in our new destination, we can use several other APIs that elasticsearch provides.

The syslog-ng OSE application sends the log messages to the official Elasticsearch client library, which forwards the data to the Elasticsearch nodes.

Image

Further Details about ElasticSearch destination:

With the use of syslog-ng configuration language, configuration is written into files. We will have to add new keyword for elasticsearch destination in C. For example, we can define ‘elasticsearch-c’ for our new destination.

elasticsearch-c can have various options to deliver the required functionalities. For example:

  • cluster()

    Description: Specifies the name or the Elasticsearch cluster, for example,

      cluster("my-elasticsearch-cluster")
    
  • cluster-url()

    Description: Specifies the URL or the Elasticsearch cluster, for example,

     cluster-url("http://192.168.10.10:9200")")
    
  • template()

    Description: The message as sent to the Elasticsearch server.

    To add a @timestamp field to the message, for example, to use with Kibana, include the @timestamp=${ISODATE} expression in the template. For example:

      template($(format-json --scope rfc5424 --exclude DATE --key ISODATE @timestamp=${ISODATE}))
    

Several other options/features can also be implemented after discussion with the mentor.

Our new ElasticSearch destination must be defined in the syslog-ng OSE configuration file and used in the log statement.

Declaration:
Destination d_name {
elasticsearch-c(template()
		cluster()  );
};
Example: Using elasticsearch-c

In the following example, the log line connects the source and the new destination.

source mysrc {
        file("/myfolder/examplesrc");
};

destination mydest {
elasticsearch-c(cluster("syslog-ng")
		template($(format-json --scope rfc5424 --exclude DATE --key ISODATE @timestamp=${ISODATE}))
);
};

log {
    source(mysrc);
    destination(mydest);
};

Knowledge Areas Required

  • Syslog-ng

    I have been using syslog-ng since last 1.5 months and have also contributed to the project in this period. I have complete understanding of the working of configuration of syslog-ng and overall syslog-ng in general.

  • Github

    Proficient.

  • Linux

    Proficient. I have been using Linux for years.

  • C

    Proficient. C being the first programming language that I learnt, I am quite comfortable with it. I have been coding in C for years and have in depth knowledge of algorithms and data structures.

  • ElasticSearch

    Familiar.

  • Autotools & Cmake, Flex & Bison

    Familiar. I will hold grasp of it really soon.

Timeline

March and April

  • Increase the familiarity with syslog-ng.
  • Increase the familiarity with codebase of syslog-ng by solving bugs.
  • I will utilise this period thoroughly to gain complete understanding of elasticsearch.
  • Implementing libcurl to gain in depth knowledge.

Community Bonding Period: April 23 - May 14

  • Plan smaller details of the module.
  • Try out several APIs of ElasticSearch.

Week 1: 14 May - 20 May

  • Discuss with the mentor about various options and flags that elasticsearch-c will have.
  • Plan on implementation procedure using the mentor’s feedback.

Week 2-4: May 21 - June 10

  • Implement the REST ElasticSearch client library.
  • Implementing basic destination driver with minimal configuration.

Week 5: June 11 - June 17

  • Code review and debugging of all the work till now.

Week 6-7: June 18 - July 1

  • Implement various discussed options and flags of elasticsearch-c.

Week 8: July 2 - July 8

  • Code review, integration and testing of all the work till now. Also, buffer for uncompleted tasks.

Week 9: July 9 - July 15

  • Write grammars, autotools and cmake files.
  • Integrate the project.

Week 10: July 16 - July 22

  • Write documentation and debug code.
  • Debian packaging of the new driver.

Week 11: July 23 - July 29

  • Period for uncompleted work.

Final Week: July 30 - August 6

  • Improve documentation.
  • Final Release.

About Me

I am Vivek, an undergraduate - sophomore year student studying Computer Science and Engineering at Indian Institute of Technology, Patna. I am passionate about programming and I love puzzle solving and Competitive Programming. I have done various different projects and have robust knowledge of several domains.

I am completely free this summer break and have no other work commitments and hence I can easily put 50 hours per week or even more , if required. Therefore, I think I can easily complete this project within the time period.

Contributions to syslog-ng so far:

I have been contributing to syslog-ng since last 2 months in one way or another. I have been very active in gitter and I have also helped few newcomers. I have directly contributed to the project and also I have helped in improving documentation.

PRs created:

Contact Information

Name: Vivek Raj

Github: definit3

Email: iamvrajj@gmail.com / vivek.cs16@iitp.ac.in

Education: Undergraduate, Bachelor of Technology, Computer, Science and Engineering, Indian Institute of Technology Patna.

Location: Patna, India ( GMT + 0530)

Phone Number: +91 7766899978

References:

Clone this wiki locally