Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Syslog Source #4511

Open
kclinden opened this issue May 7, 2024 · 1 comment
Open

Add Syslog Source #4511

kclinden opened this issue May 7, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@kclinden
Copy link

kclinden commented May 7, 2024

Is your feature request related to a problem? Please describe.
I need to be able to collect raw syslog traffic from endpoints such as network devices. Today this would require some sort of log collector beyond data prepper such as Logstash.

Describe the solution you'd like
Add a syslog source plugin similar to Logstash
https://www.elastic.co/guide/en/logstash/current/plugins-inputs-syslog.html

Describe alternatives you've considered (Optional)
Using logstash or fluentbit instead :(

Additional context
Add any other context or screenshots about the feature request here.

@kkondaka kkondaka added enhancement New feature or request and removed untriaged labels May 7, 2024
@KarstenSchnitter
Copy link
Collaborator

KarstenSchnitter commented May 15, 2024

Thanks, for supplying this issue. It is related to #2162. Let me share a few thoughts on your request.

TCP specification can be found in RFC5424. There are two transport protocols: UDP in RFC5425 and TCP in RFC5426. This separation gives an indication on the required implementations:

  1. There needs to be a TCP source with TLS support as required by RFC5424. UDP support is only recommended as "SHOULD", so this can be added later.
  2. There needs to be a syslog processor to parse the message format specified in RFC5424. This can be done with a grok configuration. Probably some performance optimisations need to happen. It might require a separate processor. This might also be necessary to correctly map the predefined values within certain fields.
  3. The syslog event contains a message (MSG) as payload. DataPrepper should at least support JSON parsing for this message out of the box. Again, this can be part of a pipeline. Probably a full configuration example will have several steps.

From my experience with different logging systems within SAP, syslog has several challenges. Correctly handling the TCP connections with regards to load-balancing and keep-alives is not easy. it is likely, that there is a great variety of load scenarios. There might be single applications, that are silent for a long period but keep the TCP connection open. DataPrepper needs to manage its resource well in that scenario. On the other hand, there might be a really high-throughput rsyslog process firing hundreds of thousands messages per second. Here buffering and throughput is a challenge. A proper backpressure on TCP ACK level would be a good idea. In summary, a TCP source for DataPrepper should expose the necessary configuration options to tune it to those situations. The TCP input plugin of Logstash is very basic and does not perform particularly well in either of those situations. If UDP transport is used, the issues of the TCP connection state are absent but so is any back pressure mechanism. An overloaded DataPrepper can only drop logs or crash entirely.

RFC5424 is relatively strict on the format of most of its fields. But not all syslog generators follow this approach tightly. There might be deviations for example in the date format or using quotation marks (") to mark fields and allow for spaces. The latter is done by CloudFoundry, for example. Ideally, DataPrepper would be resilient against those particularities. As always, it is possible, that parsing fails and messages need to be sent to a DLQ.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

No branches or pull requests

3 participants