Skip to content

GSoC2020 Idea & Project list

Balazs Scheidler edited this page Mar 23, 2020 · 22 revisions

Table of contents

  1. Guidelines

    1. Hosting, license and other bits of information
    2. Adding a new idea
    3. Submitting a proposal
  2. Ideas

    1. Add support for templates in the topic() option of the kafka() destination
    2. Add regexp-parser()
    3. Create MQTT source and/or destination
    4. Investigate the current state of syslog-ng on MacOS
    5. Add support for reingesting orphaned disk queue files
    6. Add support for proper lists into json-parser()
  3. Ideas that still need details


Guidelines

The ideas herein were contributed by the syslog-ng OSE community, by developers, users, and interested students. Some of them may be vague or incomplete. If you are a student and would like to apply to the Google Summer of Code, ask about any of these ideas either on the mailing list, or contact the person proposing it.

Being accepted as a Google Summer of Code (GSoC) student is not an easy task, it is competitive. Research the desired topic in depth, and contact the mentor and the community. If you have a new idea and would like to add it to the list, talk to the community at large, and the developers too, to ensure that there will be a mentor for the project, if selected.

In case no specific contact is given for a particular idea, questions can be asked on the mailing list or on Gitter.

Hosting, license and other bits of information

As required by the Google Summer of Code program, all contributions must be available under an open source license. In case of syslog-ng, we use two licenses: the GNU General Public License (GPL) and the GNU Lesser General Public License (LGPL), the outcome of the GSoC projects will need to use one of these licenses. Consult the mentor of the idea for details.

We also prefer to do development in the open, with communication happening on the mailing list or on Gitter, and code hosted on GitHub, where the main repository is. Depending on the proposal, students will be asked to fork syslog-ng.

Adding a new idea

Before adding a new idea, consult the community and the mentors (see above), then follow the template set by other ideas: A title, a brief description, expected results, skills required, difficulty, and topics the student may learn. See the existing ideas below.

Submitting a proposal

To submit your proposal, create a new Wiki page for it, as described in the GSoC 2020 Proposals document. Do not forget to also record your proposal on the Google Summer of Code page by posting a link to the previously created Wiki page.


Ideas

Add support for the template() syntax in the kafka() destination

Brief description:

Recently a C based implementation of the kafka() destination was added to replace the older Java based one for performance. One feature that the Java implementation supported and the current C based one does not is support for the template syntax in the topic() parameter.

The goal of this idea is to implement support for the standard syslog-ng template syntax in this parameter, allowing the use of multiple kafka topics, depending on the value of a syslog-ng name-value pair.

Example:


# this example causes syslog-ng to produce log messages on a per-host topic,
# called syslog-ng.<HOSTNAME>
destination d_kafka {
                kafka(config(
                             "queue.buffering.max.ms" => "1000",
                             "message.timeout.ms" => "5000",
                             "debug" => "all"
                      )
                      bootstrap-servers("localhost:9092")
                      flush-timeout-on-shutdown(1000)
                      topic("syslog-ng.$HOST")
                      key("${HOST}_${PROGRAM}_${PID}")
                      message("$(format-json --scope nv-pairs)"));

};

Proposed by: Balazs Scheidler

Mentor: Balazs Scheidler

Difficulty: easy

Deliverables of the project:

  • the functioning feature

Desirable skills:

  • Familiarity with syslog-ng
  • Familiarity with kafka
  • Familiarity with C
  • Familiarity with parallel programming/mutexes

What the student will learn:

  • How to deploy/use kafka as a scalable, distributed, persistent queueing mechanism
  • The basics of librdkafka, the library that implements the kafka protocol in C/C++.
  • How to get code reviewed and merged in an open source project

Add regexp-parser()

Brief description:

syslog-ng traditionally used regexps for filtering purposes (e.g. use a regexp to route a message to a different destination), however regular expressions can be useful (although pretty slow) to extract variable parts of log messages. This is possible today, but is a bit unintuitive:

# example, using a regexp to extract name value pairs from a message

filter f_extract {
    # if the pattern matches, 
    #     $0 - populated with the entire match
    #     $1 - populated with the value enclosed in parenthesis (the first capture group)
    program("postfix/([a-z]+)" flags(store-matches));
};

Difficulties with the syntax above:

  • you need to use flags(store-matches), as regexp field extractions are not stored by default
  • the names of fields are in "matches" namespace ($0, $1, $2 ...), which can be difficult to remember and which get overwritten by the next regexp
  • one can set() a "normal" name-value pair using the value of $1 with a rewrite rule, but the config quickly becomes cumbersome/unreadable
  • one can also set a "normal" name-value pair by employing the PCRE named capture group syntax (named subpatterns section in the man page pcrepattern(3)), but you can't use dots in the names which is regularly used in the naming of syslog-ng name-value pairs
  • filters should not be mutating the message (which they do with flags(store-matches))

Solution:

  • the goal is to create a regexp-parser() construct that is a parser not a filter
  • the regexp-parser() should be able to
    • use multiple regular expressions, these should be processed in order, the first successfully matching one causes the processing to stop
    • should support the prefix() option common to other parsers, so that named subpatterns can get a prefix in their names
    • if none of the regexps match, the parser should drop the message by returning FALSE

Proposed by: Balazs Scheidler

Mentor: Balazs Scheidler

Difficulty: medium

Deliverables of the project:

  • the functioning feature, tests, documentation
  • performance test (how many messages can be processed by a single regexp per second)

Desirable skills:

  • Familiarity with syslog-ng
  • Familiarity with C
  • Familiarity with parallel programming/mutexes

What the student will learn:

  • The basics of the PCRE library, how to compile/execute regular expressions
  • How to get code reviewed and merged in an open source project

Create MQTT source and/or destination

Brief description:

MQTT is a machine-to-machine (M2M)/"Internet of Things" connectivity protocol. It was designed as an extremely lightweight publish/subscribe messaging transport. It is useful for connections with remote locations where a small code footprint is required and/or network bandwidth is at a premium. (http://mqtt.org/)

The student should create MQTT subscriber/publisher in syslog-ng, over the already present TCP/IP implementation.

Proposed by: Laszlo Varady

Mentor: Attila Szakacs

Difficulty: medium

Deliverables of the project:

  • the functioning MQTT source and/or destination

Desirable skills:

  • Familiarity with C
  • Familiarity with syslog-ng
  • Familiarity with TCP/IP
  • Familiarity with TLS

What the student will learn:

  • How the MQTT protocol works
  • How a publisher/subscriber model looks like in C
  • Different aspects and difficulties of high-performance message transfer
  • How to get code reviewed and merged in an open source project

Investigate the current state of syslog-ng on MacOS

Brief description:

MacOS is currently not supported by syslog-ng. Only the building and the execution of unit tests is guaranteed. syslog-ng has a wide variety of sources/destinations, which might or might not work on MacOS. This could be investigated step-by-step, one driver at a time.

The student should build and try out the available drivers of syslog-ng on MacOS platform, and should create a summary about them. For the not working drivers a brief root-cause analysis is needed. Additional task could be to fix the currently not working features.

Proposed by: Attila Szakacs

Mentor: Attila Szakacs

Difficulty: medium

Desirable skills:

  • Computer with MacOS
  • Familiarity with MacOS
  • Familiarity with syslog-ng

What the student will learn:

  • The topology of the syslog-ng supported logging technologies
  • How to write MacOS compatible Makefiles
  • How to test a project with multiple optional dependencies on MacOS
  • How to write a summary/documentation about an investigation/research

Add support for reingesting orphaned disk queue files

Brief description:

When a config change ends up removing a destination with diskq enabled, its diskq becomes orphaned. By the end of this task syslog-ng should have a feature, that enables the user to reingest these files, so the remaining messages can be sent.

This should be done on demand via syslog-ng-ctl. Message acknowledgement and flow-control must be handled.

Proposed by: Balazs Scheidler

Mentor: Balazs Scheidler Attila Szakacs

Difficulty: medium

Deliverables of the project:

  • the functioning feature

Desirable skills:

  • Familiarity with syslog-ng
  • Familiarity with C

What the student will learn:

  • What the main concepts are of reliable log handling within a single process
  • How to get code reviewed and merged in an open source project

json-parser(): add proper support for lists

Brief description:

syslog-ng recently gained the ability to work with lists, using the $(list-*) family of template functions. These lists are simply comma separated lists of elements represented as a string in syslog-ng's name-value pairs. The representation supports a simple form of escape mechanism, so even if the values contain a comma, it still works.

The parsing of arrays in json-parser() predates this feature, so it encodes elements of the incoming array into separate name-value pairs, which is not easy to work with and is not really useful.

The goal of this item is to turn JSON arrays into well formed lists and validate (e.g. test) that:

  • the list functions work on them
  • the $(format-json) template function turns these back into lists (and make sure it happens, in case it doesn't work)

Proposed by: Balazs Scheidler

Mentor: Balazs Scheidler Attila Szakacs

Difficulty: easy

Deliverables of the project:

  • the functioning feature

Desirable skills:

  • Familiarity with syslog-ng
  • Familiarity with C

What the student will learn:

  • What the main concepts are of reliable log handling within a single process
  • How to get code reviewed and merged in an open source project

Ideas that still need detailing

  1. support for RFC6587-style protocol auto-detection (e.g. whether we need to use byte-counting or not, based on the first initial bytes of a syslog connection)
  2. add support for Elastic's beats protocol so that syslog-ng can natively interface with any Elastic agent
  3. csv-parser() enhancements: $(list-*) template function compatible output, allow the omitting of the columns() argument, in which case it uses $1, $2, $3 as values
  4. add $(dns-resolve-name) template function
  5. add ipv4 CIDR based matching to add-contextual-data(), once completed add ipv6 too.
  6. Rotate log files based on size https://github.com/syslog-ng/syslog-ng/issues/2964
Clone this wiki locally