Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add journald config #290

Merged
merged 62 commits into from
Jan 21, 2022
Merged

Add journald config #290

merged 62 commits into from
Jan 21, 2022

Conversation

luckyj5
Copy link
Contributor

@luckyj5 luckyj5 commented Nov 16, 2021

Added journald configuration under otel logs collection.

@luckyj5 luckyj5 requested review from a team as code owners November 16, 2021 07:21
@luckyj5 luckyj5 requested review from rockb1017 and removed request for a team November 19, 2021 19:09
@luckyj5 luckyj5 marked this pull request as draft November 25, 2021 19:12
@dmitryax
Copy link
Contributor

dmitryax commented Jan 19, 2022

Did you test? It doesn't for me now. I'm getting these errors

2022-01-19T05:40:45.599Z	error	Failed to process entry	{"kind": "receiver", "name": "journald", "operator_id": "$.metadata", "operator_type": "metadata", "error": {"description": "failed to add resource keys to entry: render embedded expression: invalid operation: string + <nil> (1:8)\n | \"kube:\"+$._SYSTEMD_UNIT\n | .......^"}, "action": "send", "entry": {"timestamp":"2022-01-10T12:59:37.414365Z","body":{"MESSAGE":"Disconnected from authenticating user root 101.42.101.139 port 45992 [preauth]","PRIORITY":"6","SYSLOG_FACILITY":"4","SYSLOG_IDENTIFIER":"sshd","SYSLOG_PID":"3043694","SYSLOG_TIMESTAMP":"Jan 10 12:59:37 ","_BOOT_ID":"94dd12ba0583423c8703e9195b8074b6","_GID":"0","_HOSTNAME":"ip-172-20-36-112","_MACHINE_ID":"ec2ffb32d5187405b7eac9b9a1b87a4b","_PID":"3043694","_SOURCE_REALTIME_TIMESTAMP":"1641819577413275","_TRANSPORT":"syslog","_UID":"0","__CURSOR":"s=1e88599f1b0b4b3ab1eb058fa87c8ba5;i=98a81d;b=94dd12ba0583423c8703e9195b8074b6;m=7f4c4d9d511;t=5d539e8b272dd;x=5111cb02c2128825","__MONOTONIC_TIMESTAMP":"8747856024849"},"severity":0}}

Looks like _SYSTEMD_UNIT is not passed by the operator. Not sure why.

@luckyj5
Copy link
Contributor Author

luckyj5 commented Jan 19, 2022

Did you test? It doesn't for me now. I'm getting these errors

2022-01-19T05:40:45.599Z	error	Failed to process entry	{"kind": "receiver", "name": "journald", "operator_id": "$.metadata", "operator_type": "metadata", "error": {"description": "failed to add resource keys to entry: render embedded expression: invalid operation: string + <nil> (1:8)\n | \"kube:\"+$._SYSTEMD_UNIT\n | .......^"}, "action": "send", "entry": {"timestamp":"2022-01-10T12:59:37.414365Z","body":{"MESSAGE":"Disconnected from authenticating user root 101.42.101.139 port 45992 [preauth]","PRIORITY":"6","SYSLOG_FACILITY":"4","SYSLOG_IDENTIFIER":"sshd","SYSLOG_PID":"3043694","SYSLOG_TIMESTAMP":"Jan 10 12:59:37 ","_BOOT_ID":"94dd12ba0583423c8703e9195b8074b6","_GID":"0","_HOSTNAME":"ip-172-20-36-112","_MACHINE_ID":"ec2ffb32d5187405b7eac9b9a1b87a4b","_PID":"3043694","_SOURCE_REALTIME_TIMESTAMP":"1641819577413275","_TRANSPORT":"syslog","_UID":"0","__CURSOR":"s=1e88599f1b0b4b3ab1eb058fa87c8ba5;i=98a81d;b=94dd12ba0583423c8703e9195b8074b6;m=7f4c4d9d511;t=5d539e8b272dd;x=5111cb02c2128825","__MONOTONIC_TIMESTAMP":"8747856024849"},"severity":0}}

Looks like _SYSTEMD_UNIT is not passed by the operator. Not sure why.

@dmitryax Thanks for checking. It was working fine during my testing as I was testing with units being present in the values.yaml.
Updated the template to remove _SYSTEMD_UNIT when journald.units are NOT provided in the config.

@dmitryax
Copy link
Contributor

dmitryax commented Jan 20, 2022

This is sad that not specifying units explicitly doesn't give us journald.unit.name nor com.splunk.sourcetype. Without those fields it's in a kind of a broken state. For me as a user, I don't think it's even useful if I cannot understand which service a message is coming from.

Do you know if this is a journald limitation, or something that can be fixed in journald operator of otel logs collection library?

If we cannot do anything with that on otel collector side, I would suggest even removing the option to collect logs from all units and always require specifying the units explicitly. We can have 3 units enabled by default as you provided initially in this PR, but that it's introducing 3 new receivers with the current implementation. What if we switch to the 3 units specified explicitly, but keep journald logs collection disabled by default. And once this issue is resolved, we enable it by default. How does it sound?

@luckyj5
Copy link
Contributor Author

luckyj5 commented Jan 20, 2022

This is sad that not specifying units explicitly doesn't give us journald.unit.name nor com.splunk.sourcetype. Without those fields it's in a kind of a broken state. For me as a user, I don't think it's even useful if I cannot understand which service a message is coming from.

Do you know if this is a journald limitation, or something that can be fixed in journald operator of otel logs collection library?

If we cannot do anything with that on otel collector side, I would suggest even removing the option to collect logs from all units and always require specifying the units explicitly. We can have 3 units enabled by default as you provided initially in this PR, but that it's introducing 3 new receivers with the current implementation. What if we switch to the 3 units specified explicitly, but keep journald logs collection disabled by default. And once this issue is resolved, we enable it by default. How does it sound?

@dmitryax
The way journalctl works is if called without parameters, it will show the full contents of the journal, starting with the oldest entry collected. If one or more match arguments are passed, the output is filtered accordingly.
Read more at: https://www.commandlinux.com/man-page/man1/journalctl.1.html

Units are configurable and thats how Journald input works, it provides unit value if its a part of the supplied configuration.

Units parameter supports PATTERN match, if configured, units parameter show messages for the specified systemd unit UNIT, or for any of the units matched by PATTERN. So potentially we can collect all units (when not configured) by using a PATTERN that will match all units. (This will require a change on journald input operator side)

Question is, do we want to collect all units by default when units are not configured in values.yaml There can be many service units enabled and might not all be that useful to the user? Eg:
image

The reason we are not collecting all by default is, it creates lot of noise, and hence we started this PR with Units being required field (with default values) which we did not like as we were trying to go away from more required parameters in the configuration.

Here are the options I think we have:

  1. Leave it as is where if units are not configured, unit.name and sourcetype wont have service name. Thats how journalctl works as well. We can have any other value for sourcetype (eg: kube:journald?)
  2. Update the config to have 3 default unit values (make it a required field), keep journald disabled by default and enable it once this issue is resolved.
  3. If we are interested in collecting all the units, update journald-input operator and then update the template on our side.

I know we are trying to have the best possible version of journald data collection with more flexibility and controlled filtering, but if something seems not reasonable or not user friendly, we can always update it to go with the best possible approach based on the feedback we receive from the users or community?

Please let me know what you think and which option seems reasonable as of now? Thank you!

@dmitryax
Copy link
Contributor

dmitryax commented Jan 20, 2022

Thanks for providing the detailed answer.

  1. Leave it as is where if units are not configured, unit.name and sourcetype wont have service name. Thats how journalctl works as well. We can have any other value for sourcetype (eg: kube:journald?)

As a user, I don't really like this solution. I always want to know which service my logs are coming from. And I'm sure this would be expected behavior from other users.

  1. Update the config to have 3 default unit values (make it a required field), keep journald disabled by default and enable it once this issue is resolved.

I think for now we should go this route and completely remove an option to specify all units.

Once we have a solution on the operator side as use suggested (3), we can add the all units option back, but still probably not as a default behavior since you mentioned that it's noisy. Probably we don't even need to approach the all units option (3) until there is an ask for that from users.

What do you think?

@luckyj5
Copy link
Contributor Author

luckyj5 commented Jan 20, 2022

What do you think?

Sounds good to me, we can have all units option if/ when there is an ask.

@luckyj5
Copy link
Contributor Author

luckyj5 commented Jan 20, 2022

@dmitryax Updated the PR back to have units configured by default, and disable journald by default in the config.
Let me know if anything. Thanks!

CHANGELOG.md Outdated Show resolved Hide resolved
luckyj5 and others added 3 commits January 20, 2022 09:53
Co-authored-by: Dmitrii Anoshin <anoshindx@gmail.com>
@luckyj5
Copy link
Contributor Author

luckyj5 commented Jan 20, 2022

@dmitryax updated.
Let me know if there is anything which needs to be updated. Thanks!

Copy link
Contributor

@dmitryax dmitryax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me! Just one last question

@luckyj5 luckyj5 requested a review from dmitryax January 21, 2022 00:09
Copy link
Contributor

@dmitryax dmitryax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks!

@dmitryax dmitryax merged commit fed9f0e into signalfx:main Jan 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants