Skip to content

GSoC2022 Idea & Project list

László Várady edited this page Apr 19, 2022 · 17 revisions

Table of contents

  1. Guidelines

    1. Hosting, license and other bits of information
    2. Adding a new idea
    3. Submitting a proposal
  2. Ideas

    1. Add support for reingesting orphaned disk queue files
    2. Syslog network transport protocol auto-detection (RFC6587-style)
    3. Rotate log files based on size
    4. Template function for DNS name resolution
    5. csv-parser() enhancements: $(list-*) template function compatible output
    6. Support Elastic's beats protocol
  3. Ideas that still need detailing


Guidelines

The ideas herein were contributed by the syslog-ng OSE community, by developers, users, and interested students. Some of them may be vague or incomplete. If you are a student and would like to apply to the Google Summer of Code, ask about any of these ideas either on the mailing list, or contact the person proposing it.

Being accepted as a Google Summer of Code (GSoC) student is not an easy task, it is competitive. Research the desired topic in depth, and contact the mentor and the community. If you have a new idea and would like to add it to the list, talk to the community at large, and the developers too, to ensure that there will be a mentor for the project, if selected.

In case no specific contact is given for a particular idea, questions can be asked on the mailing list or on Gitter.

Hosting, license and other bits of information

As required by the Google Summer of Code program, all contributions must be available under an open source license. In case of syslog-ng, we use two licenses: the GNU General Public License (GPL) and the GNU Lesser General Public License (LGPL), the outcome of the GSoC projects will need to use one of these licenses. Consult the mentor of the idea for details.

We also prefer to do development in the open, with communication happening on the mailing list or on Gitter, and code hosted on GitHub, where the main repository is. Depending on the proposal, students will be asked to fork syslog-ng.

Adding a new idea

Before adding a new idea, consult the community and the mentors (see above), then follow the template set by other ideas: A title, a brief description, expected results, skills required, difficulty, and topics the student may learn. See the existing ideas below.

Submitting a proposal

To submit your proposal, create a new Wiki page for it, as described in the GSoC 2022 Proposals document. Do not forget to also record your proposal on the Google Summer of Code page by posting a link to the previously created Wiki page.


Ideas

Add support for reingesting orphaned disk queue files

Brief description:

syslog-ng supports disk-based buffering with its disk-buffer() module, which is an essential feature for reliable log management. It makes it possible to persist a large number of messages when the destination is unavailable, even between syslog-ng/OS restarts or an upgrade. It also helps strengthen the delivery guarantees of syslog-ng in such cases.

When a config change ends up removing a destination with disk-buffer enabled, its queue becomes orphaned, which means the disk queue gets disassociated from its destination. Currently, there is no convenient method for processing such "orphaned" disk queues.

By the end of this task syslog-ng should have a feature, that enables the user to reingest these files, so the remaining messages can be sent.

This should be done on demand via syslog-ng-ctl. Message acknowledgement and flow-control must be handled.

Proposed by: Balazs Scheidler

Mentor: Balazs Scheidler, László Várady

Difficulty: medium

Project size: 175 hour

Deliverables of the project:

  • The functioning feature
  • Functional tests
  • Documentation of the new feature

Desirable skills:

  • Familiarity with syslog-ng
  • Familiarity with C

What the student will learn:

  • What the main concepts are of reliable log handling within a single process
  • How to get code reviewed and merged in an open-source project

Syslog network transport protocol auto-detection (RFC6587-style)

Brief description:

syslog-ng supports a wide variety of network transports, such as TCP, UDP, TLS, proxied-TCP/TLS, HTTP, etc.

Traditional syslog protocols are based on plain UDP, TCP, and TLS, but there are multiple standardized methods for message transfer even in such a simple protocol. These methods are described in RFC6587. Octet-counting is preferred nowadays, but the old (non-transparent-framing) method is still widely used. Currently, syslog-ng can not auto-detect these framing methods, it is the user's responsibility to configure syslog-ng according to the protocol used by the log forwarders.

The goal is to implement an auto-detection logic for the syslog() source based on the first bytes of the incoming connection.

Proposed by: Balazs Scheidler

Mentor: Balazs Scheidler, László Várady

Difficulty: easy

Project size: 175 hour

Deliverables of the project:

  • Auto-detection mechanism implemented
  • Unit tests, end-to-end functional tests
  • Documentation

Desirable skills:

  • Familiarity with syslog-ng
  • Familiarity with C
  • Familiarity with TCP/IP

What the student will learn:

  • About widely used syslog formats (RFC5424, RFC3164) and transport protocols (RFC6587, RFC5425)
  • How syslog parsers are implemented (performance is key)
  • How to write unit tests in C following clean-code principles
  • How to get code reviewed and merged in an open-source project

Rotate log files based on size (#2964)

Brief description:

In syslog-ng, log rotation can currently be achieved using macros (${HOST}/${YEAR}_${MONTH}_${DAY}.log), or external tools (scripting, configuring logrotate), which require reloading syslog-ng syslog-ng-ctl reload/reopen.

The goal of this project is to implement a new log rotation mechanism natively, based on the size of the written file.

More information about the suggested design: #2966

Mentor: László Várady

Difficulty: medium

Project size: 175 hour

Deliverables of the project:

  • The feature implemented
  • Unit and functional tests
  • Documentation

Desirable skills:

  • Familiarity with syslog-ng
  • Familiarity with C

What the student will learn:

  • How to implement a new feature into an asynchronous, non-blocking "ecosystem" written in C
  • How to apply "clean code" concepts to C
  • How to get code reviewed and merged in an open-source project

Template function for DNS name resolution

Brief description:

syslog-ng supports different DNS resolution techniques on the source side (when processing incoming messages) that use the HOST field of the incoming message or peer information available from the connection, but this is not the only place and case where DNS name resolution can be useful. As syslog-ng operates on structured messages and has a wide variety of message enrichment capabilities, it would be useful to provide a method that can resolve domain names from any key-value pair or template.

Similarly to the existing $(dns-resolve-ip) function, a new template function called $(dns-resolve-name) should be implemented, which resolves a given hostname/FQDN to an IPv4/IPv6 address.

The template function should support and conform to syslog-ng's DNS-related options.

Mentor: László Várady

Difficulty: easy

Project size: 175 hour

Deliverables of the project:

  • The feature implemented
  • Functional tests
  • Documentation

Desirable skills:

  • Familiarity with syslog-ng
  • Familiarity with C
  • Familiarity with DNS

What the student will learn:

  • Basic DNS resolution methods in C, how the blocking nature of DNS resolution affects a fully async/non-blocking system
  • How to implement a new template function for syslog-ng in C, following clean code principles
  • How to get code reviewed and merged in an open-source project

csv-parser() enhancements: $(list-*) template function compatible output

Brief description:

syslog-ng supports parsing CSV-like (comma separated values) data from messages with its csv-parser(). The parsed data is stored in structured key-value pairs, that can be processed further or used when generating output (for example, JSON) for destinations.

Currently, csv-parser() has a mandatory option, columns("KEY1", "KEY2", ...), which specifies the name of the key-value pairs to parse the fields into.

The goal of this project idea is to make the columns() option optional and provide an alternative method for storing CSV fields: syslog-ng lists. Instead of specifying the name of each column, a list should be created out of those CSV fields, which can be manipulated with $(list-*) template functions.

Proposed by: Balazs Scheidler

Mentor: Balazs Scheidler, László Várady

Difficulty: easy

Project size: 175 hour

Deliverables of the project:

  • The feature implemented
  • Unit tests
  • Documentation

Desirable skills:

  • Familiarity with syslog-ng
  • Familiarity with C

What the student will learn:

  • How efficient parsers are implemented and work in syslog-ng
  • How to get code reviewed and merged in an open-source project

Support Elastic's beats protocol

Brief description:

Add support for Elastic's beats protocol so that syslog-ng can natively interface with any Elastic agent

Mentor: Balazs Scheidler, László Várady

Difficulty: hard

Project size: 350 hour

Ideas that still need detailing

  1. add support for Elastic's beats protocol so that syslog-ng can natively interface with any Elastic agent
  2. add ipv4 CIDR based matching to add-contextual-data(), once completed add ipv6 too.
Clone this wiki locally