Skip to content

GSoC2021 Idea & Project list

Kókai Péter edited this page Apr 14, 2021 · 9 revisions

Table of contents

  1. Guidelines

    1. Hosting, license and other bits of information
    2. Adding a new idea
    3. Submitting a proposal
  2. Ideas

    1. Add regexp-parser()
    2. Create MQTT source and/or destination
    3. Investigate the current state of syslog-ng on MacOS
    4. Add support for reingesting orphaned disk queue files
    5. Add support for proper lists into json-parser()
    6. Extend zap logger (golang) with structured syslog-ng encoder
    7. Add EOS (Exactly Once Semantics) support to kafka-c destination driver
  3. Ideas that still need detailing


Guidelines

The ideas herein were contributed by the syslog-ng OSE community, by developers, users, and interested students. Some of them may be vague or incomplete. If you are a student and would like to apply to the Google Summer of Code, ask about any of these ideas either on the mailing list, or contact the person proposing it.

Being accepted as a Google Summer of Code (GSoC) student is not an easy task, it is competitive. Research the desired topic in depth, and contact the mentor and the community. If you have a new idea and would like to add it to the list, talk to the community at large, and the developers too, to ensure that there will be a mentor for the project, if selected.

In case no specific contact is given for a particular idea, questions can be asked on the mailing list or on Gitter.

Hosting, license and other bits of information

As required by the Google Summer of Code program, all contributions must be available under an open source license. In case of syslog-ng, we use two licenses: the GNU General Public License (GPL) and the GNU Lesser General Public License (LGPL), the outcome of the GSoC projects will need to use one of these licenses. Consult the mentor of the idea for details.

We also prefer to do development in the open, with communication happening on the mailing list or on Gitter, and code hosted on GitHub, where the main repository is. Depending on the proposal, students will be asked to fork syslog-ng.

Adding a new idea

Before adding a new idea, consult the community and the mentors (see above), then follow the template set by other ideas: A title, a brief description, expected results, skills required, difficulty, and topics the student may learn. See the existing ideas below.

Submitting a proposal

To submit your proposal, create a new Wiki page for it, as described in the GSoC 2021 Proposals document. Do not forget to also record your proposal on the Google Summer of Code page by posting a link to the previously created Wiki page.


Ideas

Add regexp-parser()

Brief description:

syslog-ng traditionally used regexps for filtering purposes (e.g. use a regexp to route a message to a different destination), however regular expressions can be useful (although pretty slow) to extract variable parts of log messages. This is possible today, but is a bit unintuitive:

# example, using a regexp to extract name value pairs from a message

filter f_extract {
    # if the pattern matches, 
    #     $0 - populated with the entire match
    #     $1 - populated with the value enclosed in parenthesis (the first capture group)
    program("postfix/([a-z]+)" flags(store-matches));
};

Difficulties with the syntax above:

  • you need to use flags(store-matches), as regexp field extractions are not stored by default
  • the names of fields are in "matches" namespace ($0, $1, $2 ...), which can be difficult to remember and which get overwritten by the next regexp
  • one can set() a "normal" name-value pair using the value of $1 with a rewrite rule, but the config quickly becomes cumbersome/unreadable
  • one can also set a "normal" name-value pair by employing the PCRE named capture group syntax (named subpatterns section in the man page pcrepattern(3)), but you can't use dots in the names which is regularly used in the naming of syslog-ng name-value pairs
  • filters should not be mutating the message (which they do with flags(store-matches))

Solution:

  • the goal is to create a regexp-parser() construct that is a parser not a filter
  • the regexp-parser() should be able to
    • use multiple regular expressions, these should be processed in order, the first successfully matching one causes the processing to stop
    • should support the prefix() option common to other parsers, so that named subpatterns can get a prefix in their names
    • if none of the regexps match, the parser should drop the message by returning FALSE

Proposed by: Balazs Scheidler

Mentor: Balazs Scheidler

Difficulty: medium

Deliverables of the project:

  • the functioning feature, tests, documentation
  • performance test (how many messages can be processed by a single regexp per second)

Desirable skills:

  • Familiarity with syslog-ng
  • Familiarity with C
  • Familiarity with parallel programming/mutexes

What the student will learn:

  • The basics of the PCRE library, how to compile/execute regular expressions
  • How to get code reviewed and merged in an open source project

Create MQTT source and/or destination

Brief description:

MQTT is a machine-to-machine (M2M)/"Internet of Things" connectivity protocol. It was designed as an extremely lightweight publish/subscribe messaging transport. It is useful for connections with remote locations where a small code footprint is required and/or network bandwidth is at a premium. (http://mqtt.org/)

The student should create MQTT subscriber/publisher in syslog-ng, over the already present TCP/IP implementation.

Proposed by: Laszlo Varady

Mentor: Laszlo Varady

Co-mentor: Laszlo Budai

Difficulty: medium

Deliverables of the project:

  • the functioning MQTT source and/or destination

Desirable skills:

  • Familiarity with C
  • Familiarity with syslog-ng
  • Familiarity with TCP/IP
  • Familiarity with TLS

What the student will learn:

  • How the MQTT protocol works
  • How a publisher/subscriber model looks like in C
  • Different aspects and difficulties of high-performance message transfer
  • How to get code reviewed and merged in an open source project

Investigate the current state of syslog-ng on MacOS

Brief description:

MacOS is currently not supported by syslog-ng. Only the building and the execution of unit tests is guaranteed. syslog-ng has a wide variety of sources/destinations, which might or might not work on MacOS. This could be investigated step-by-step, one driver at a time.

The student should build and try out the available drivers of syslog-ng on MacOS platform, and should create a summary about them. For the not working drivers a brief root-cause analysis is needed. Additional task could be to fix the currently not working features.

Proposed by: Attila Szakacs

Mentor: Attila Szakacs

Difficulty: medium

Desirable skills:

  • Computer with MacOS
  • Familiarity with MacOS
  • Familiarity with syslog-ng

What the student will learn:

  • The topology of the syslog-ng supported logging technologies
  • How to write MacOS compatible Makefiles
  • How to test a project with multiple optional dependencies on MacOS
  • How to write a summary/documentation about an investigation/research

Add support for reingesting orphaned disk queue files

Brief description:

When a config change ends up removing a destination with diskq enabled, its diskq becomes orphaned. By the end of this task syslog-ng should have a feature, that enables the user to reingest these files, so the remaining messages can be sent.

This should be done on demand via syslog-ng-ctl. Message acknowledgement and flow-control must be handled.

Proposed by: Balazs Scheidler

Mentor: Balazs Scheidler Attila Szakacs

Difficulty: medium

Deliverables of the project:

  • the functioning feature

Desirable skills:

  • Familiarity with syslog-ng
  • Familiarity with C

What the student will learn:

  • What the main concepts are of reliable log handling within a single process
  • How to get code reviewed and merged in an open source project

json-parser(): add proper support for lists

Brief description:

syslog-ng recently gained the ability to work with lists, using the $(list-*) family of template functions. These lists are simply comma separated lists of elements represented as a string in syslog-ng's name-value pairs. The representation supports a simple form of escape mechanism, so even if the values contain a comma, it still works.

The parsing of arrays in json-parser() predates this feature, so it encodes elements of the incoming array into separate name-value pairs, which is not easy to work with and is not really useful.

The goal of this item is to turn JSON arrays into well formed lists and validate (e.g. test) that:

  • the list functions work on them
  • the $(format-json) template function turns these back into lists (and make sure it happens, in case it doesn't work)

Proposed by: Balazs Scheidler

Mentor: Balazs Scheidler Attila Szakacs

Difficulty: easy

Deliverables of the project:

  • the functioning feature

Desirable skills:

  • Familiarity with syslog-ng
  • Familiarity with C

What the student will learn:

  • What the main concepts are of reliable log handling within a single process
  • How to get code reviewed and merged in an open source project

Extend zap logger (golang) with structured syslog-ng encoder

Brief description

The goal is to implement a syslog-ng encoder and a sink for zap[1] logger. he encoder should be able to keep the structured format of the log messages (eg. key-value pairs, json). The other part of the task is to add an SCL[2] to syslog-ng for receiving, parsing these logs.

Tasks:

  • Choose format
  • Implement encoder
  • Implement syslog-ng SCL module
  • Documentation, blog post

Proposed by: Laszlo Budai

Mentor: Laszlo Budai (co-mentor: Laszlo Varady )

Difficulty: medium

Deliverables of the project:

  • A zap logger extension and its syslog-ng counterpart - with the two anyone can easily add advanced logging capabilities into their solutions (filter messages based on key-value pairs, parse metrics and send to a monitoring system, deliver log messages to SIEMs, and so on).

Desirable skills:

  • Familiriaty with the Go programming language
  • Familiarity with syslog-ng

What the student will learn:

  • How to format a simple message as a log message in go
  • How to implement a zap extension in go
  • How to implement a syslog-ng SCL module

Add EOS (Exactly Once Semantics) support to kafka-c destination driver

Brief description

The goal is to extend kafka-c destination driver with EOS by using the transactional producer API. Kafka destination uses librdkafka and librdkafka supports Transactions from v1.4.0

Tasks:

  • Modify existing kafka destination driver
  • Implement new test cases
  • Documentation, blog post

Proposed by: Laszlo Budai

Mentor: Laszlo Budai (co-mentor: Kokan )

Difficulty: hard

Deliverables of the project: Kafka destination driver supports a new messaging semantics.

Desirable skills:

  • Familiriaty with the C programming language
  • Familiarity with Kafka
  • Familiarity with librdkafka

What the student will learn:

  • How http destination drivers works in syslog-ng
  • librdkafka

Ideas that still need detailing

  1. support for RFC6587-style protocol auto-detection (e.g. whether we need to use byte-counting or not, based on the first initial bytes of a syslog connection)
  2. add support for Elastic's beats protocol so that syslog-ng can natively interface with any Elastic agent
  3. csv-parser() enhancements: $(list-*) template function compatible output, allow the omitting of the columns() argument, in which case it uses $1, $2, $3 as values
  4. add $(dns-resolve-name) template function
  5. add ipv4 CIDR based matching to add-contextual-data(), once completed add ipv6 too.
  6. Rotate log files based on size https://github.com/syslog-ng/syslog-ng/issues/2964
Clone this wiki locally