Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(outputs): Add Zabbix plugin #13739

Merged
merged 4 commits into from Jan 24, 2024

Conversation

adrianlzt
Copy link
Contributor

Add output plugin to support sending metrics to Zabbix (https://www.zabbix.com/).

This output plugin handle sending metrics as traps, generating LLD data to feed discovery rules and is able to send autoregistration requests.

Required for all PRs

Supersedes PR #3966

Copy link
Contributor

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adrianlzt for your contribution! I have some comments in the code...

plugins/outputs/zabbix/README.md Outdated Show resolved Hide resolved
plugins/outputs/zabbix/README.md Outdated Show resolved Hide resolved
plugins/outputs/zabbix/README.md Outdated Show resolved Hide resolved
plugins/outputs/zabbix/README.md Outdated Show resolved Hide resolved
plugins/outputs/zabbix/README.md Outdated Show resolved Hide resolved
plugins/outputs/zabbix/zabbix.go Outdated Show resolved Hide resolved
plugins/outputs/zabbix/zabbix.go Outdated Show resolved Hide resolved
plugins/outputs/zabbix/zabbix.go Outdated Show resolved Hide resolved
plugins/outputs/zabbix/zabbix.go Outdated Show resolved Hide resolved
plugins/outputs/zabbix/zabbix.go Outdated Show resolved Hide resolved
@srebhan srebhan self-assigned this Aug 8, 2023
@srebhan srebhan added feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin plugin/output 1. Request for new output plugins 2. Issues/PRs that are related to out plugins labels Aug 8, 2023
@srebhan srebhan changed the title feat: add Zabbix output plugin feat(outputs): Add Zabbix plugin Aug 8, 2023
@adrianlzt adrianlzt force-pushed the feature/zabbix_output branch 2 times, most recently from c180c6d to 6a7663d Compare August 9, 2023 07:34
@adrianlzt adrianlzt marked this pull request as draft August 11, 2023 07:07
@adrianlzt adrianlzt marked this pull request as ready for review August 14, 2023 19:25
@adrianlzt
Copy link
Contributor Author

@srebhan let me know if that's ok to squash all commits

Copy link
Contributor

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the nice update @adrianlzt! There are a few minor comments and two larger ones, namely the removal of elementsMatch (replace by a hash comparison) and replacing assert with require in the tests.

Looking forward to the next round! Please note that I will likely not find time to do another round of reviews this week and next week...

plugins/outputs/zabbix/autoregister_test.go Outdated Show resolved Hide resolved
plugins/outputs/zabbix/lld.go Outdated Show resolved Hide resolved
plugins/outputs/zabbix/lld.go Outdated Show resolved Hide resolved
plugins/outputs/zabbix/lld.go Outdated Show resolved Hide resolved
plugins/outputs/zabbix/lld.go Outdated Show resolved Hide resolved
plugins/outputs/zabbix/lld.go Outdated Show resolved Hide resolved
plugins/outputs/zabbix/lld_test.go Outdated Show resolved Hide resolved
plugins/outputs/zabbix/utils.go Outdated Show resolved Hide resolved
plugins/outputs/zabbix/utils.go Outdated Show resolved Hide resolved
Copy link
Contributor

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the nice update @adrianlzt! Sorry for the late response but even I need some holidays... ;-)

I have some more comments and one suggestion for the LLD implementation. What do you think?

plugins/outputs/zabbix/README.md Outdated Show resolved Hide resolved
plugins/outputs/zabbix/README.md Outdated Show resolved Hide resolved
plugins/outputs/zabbix/README.md Outdated Show resolved Hide resolved
plugins/outputs/zabbix/README.md Outdated Show resolved Hide resolved
plugins/outputs/zabbix/README.md Outdated Show resolved Hide resolved
plugins/outputs/zabbix/lld.go Outdated Show resolved Hide resolved
Comment on lines 151 to 148
// Send empty LLDs for the LLDs that were sent the last time but not this time.
for key := range zl.previousReceivedData {
emptyDataValuesJSON, err := json.Marshal(map[string]interface{}{"data": []interface{}{}})
if err != nil {
zl.log.Warnf("Marshaling to JSON empty data Zabbix format: %v", err)

continue
}

// Add the LLD m
m := metric.New(
lldName,
map[string]string{
hostTag: key.Hostname,
},
map[string]interface{}{
key.LLDKey: emptyDataValuesJSON,
},
time.Now(),
)
metrics = append(metrics, m)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block does not what the comment says. It unconditionally sends an empty value for each previously seen data item. Shouldn't it check somewhere if we also do have data in the current metric set!?!?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improved command and added an explicit test on c86330c.
In this loop, zl.previousReceivedData, only stores LLDs not seen this time.

plugins/outputs/zabbix/lld.go Show resolved Hide resolved
plugins/outputs/zabbix/zabbix.go Outdated Show resolved Hide resolved
plugins/outputs/zabbix/zabbix.go Show resolved Hide resolved
@srebhan
Copy link
Contributor

srebhan commented Sep 1, 2023

@adrianlzt any thoughts on my LLD suggestion? I can also push it to your repo if you want (and allow it)!?

@srebhan
Copy link
Contributor

srebhan commented Sep 11, 2023

@adrianlzt any update on the lld part?

plugins/outputs/zabbix/README.md Show resolved Hide resolved
plugins/outputs/zabbix/README.md Outdated Show resolved Hide resolved
plugins/outputs/zabbix/README.md Outdated Show resolved Hide resolved
plugins/outputs/zabbix/README.md Outdated Show resolved Hide resolved
plugins/outputs/zabbix/README.md Outdated Show resolved Hide resolved
plugins/outputs/zabbix/zabbix.go Outdated Show resolved Hide resolved
plugins/outputs/zabbix/utils.go Outdated Show resolved Hide resolved
plugins/outputs/zabbix/utils.go Outdated Show resolved Hide resolved
plugins/outputs/zabbix/lld.go Outdated Show resolved Hide resolved
plugins/outputs/zabbix/README.md Outdated Show resolved Hide resolved
plugins/outputs/zabbix/utils.go Outdated Show resolved Hide resolved
plugins/outputs/zabbix/zabbix.go Outdated Show resolved Hide resolved
Copy link
Contributor

@Hipska Hipska left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a large review for the tests

plugins/outputs/zabbix/README.md Outdated Show resolved Hide resolved
plugins/outputs/zabbix/README.md Outdated Show resolved Hide resolved
plugins/outputs/zabbix/utils.go Outdated Show resolved Hide resolved
plugins/outputs/zabbix/autoregister_test.go Outdated Show resolved Hide resolved
plugins/outputs/zabbix/autoregister_test.go Outdated Show resolved Hide resolved
plugins/outputs/zabbix/zabbix_test.go Outdated Show resolved Hide resolved
plugins/outputs/zabbix/zabbix_test.go Outdated Show resolved Hide resolved
plugins/outputs/zabbix/zabbix_test.go Outdated Show resolved Hide resolved
plugins/outputs/zabbix/zabbix_test.go Outdated Show resolved Hide resolved
plugins/outputs/zabbix/zabbix_test.go Outdated Show resolved Hide resolved
@srebhan
Copy link
Contributor

srebhan commented Sep 26, 2023

@adrianlzt any update to @Hipska's comments?

@adrianlzt
Copy link
Contributor Author

Sorry, quite busy right now. I will try to get back to this by the end of this week.

@adrianlzt
Copy link
Contributor Author

I have not forgotten this. I will come back to this in a couple of weeks. Sorry

@Hipska Hipska added the waiting for response waiting for response from contributor label Oct 16, 2023
@srebhan
Copy link
Contributor

srebhan commented Oct 20, 2023

@adrianlzt can you please at least send a keep-alive every second week so we know you are still looking after this!?

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Oct 20, 2023
@srebhan srebhan added the waiting for response waiting for response from contributor label Oct 20, 2023
@adrianlzt
Copy link
Contributor Author

Still working on it.

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Nov 2, 2023
@Hipska
Copy link
Contributor

Hipska commented Nov 2, 2023

Please mark it as a draft in between..

adrianlzt added a commit to datadope-io/telegraf that referenced this pull request Jan 23, 2024
influxdata#13739 (comment)

I have made some modifications to be able to change the hostTag and to
clear the current map after each push:
	zl.current = make(map[uint64]lldInfo, len(zl.current))

This implementation fails with the test
"TestAddAndPush/one metric changes the value of the tag, it should send the new value and not send and empty lld"
It tries to send an empty LLD if the content of the LLD changes, but
that is not correct.

Example of what the current implementation is doing:
disk host=foo,disk=sda free=10
--push (LLD {#DISK}=sda)
disk host=foo,disk=sdb free=10
--push (LLD {#DISK}=sdb) + empty LLD

That last empty LLD should not be sent, as it will be received after the
{#DISK}=sdb and tell zabbix the are no values for that LLD.

The problem resides in how hash are created and handled in the Push()
function.
The hash changes if the content of data changes, but does not take into
account that the same lld (hostname+key) with different values should
avoid sending an empty LLD.
adrianlzt added a commit to datadope-io/telegraf that referenced this pull request Jan 23, 2024
@adrianlzt
Copy link
Contributor Author

@powersj , that's correct. In the normal zabbix-server - zabbix-agent configuration, the low level discovery is configured to run each N minutes. Once the zabbix-server has received the discovery and created the items, it ask to zabbix-agent for those new items.

So zabbix-agent will not send the data until the LLD has been executed once. Telegraf will send metrics but will not be accepted till LLD has been received.

@Hipska
Copy link
Contributor

Hipska commented Jan 23, 2024

Should I squash all comits into the first one?

If you are going to do a force push with all commits again, then yes, that might be better indeed. Now it is polluting the PR history a bit 😛

Add output plugin to support sending metrics to Zabbix
(https://www.zabbix.com/).

This output plugin handle sending metrics as traps, generating LLD data
to feed discovery rules and is able to send autoregistration requests.
@adrianlzt
Copy link
Contributor Author

Sorry, was a rebase to fix the go.mod conflict.

@powersj
Copy link
Contributor

powersj commented Jan 23, 2024

@adrianlzt,

Thanks for the updates and clarifications. I've added a few comments and need to further understand this experience:

Telegraf will send metrics but will not be accepted till LLD has been received.

What does the user see until the LDD is run? Are metrics reported as successfully written by telegraf? Does the agent cache those metrics? Is there a limit on the cache?

I am asking as we had a user report this similar behavior with another output, which has a second level of cache and was confused when telegraf said things are writing successfully, but the metrics were not getting to the server.

@srebhan
Copy link
Contributor

srebhan commented Jan 24, 2024

@adrianlzt along the lines of @powersj, can we force an LLD update before sending the first batch?

@adrianlzt
Copy link
Contributor Author

The user, from the zabbix-web perspective, won't see any items related to that LLD until it is send.

The typical example is disk monitoring. Zabbix define a discovery rule that will receive which disks are present in the server . Once the lld data is received it will create the items for each disk, for example:

  • telegraf.disk.free[sda]
  • telegraf.disk.free[sdb]

From the telegraf point of view, the metrics send to those items will fail until they are created. The problem is telegraf could not known which metrics are being rejected because zabbix response give just a total count of passed and failed metrics:

[{"response":"success","info":"processed: 0; failed: 1; total: 1; seconds spent: 0.000046"}]

One option could be to send one metric at a time, but that will be completely inefficient.

So, currently, telegraf send metrics that are ignored by zabbix.

The option to send first the LLD and then metrics could be an option, but I see a timing problem.
How much telegraf is going to wait queueing those metrics?

Currently, telegraf gather info for lld_send_interval before sending the LLD.
That is used to be able to form the LLD completely.
What I mean is that two different inputs could be aggregated in the same LLD. Probably not the more common scenario, but still an option.

Example, two different inputs, with different gather intervals that generate these metrics:

disk,host=foo,name=sda time=123
disk,host=foo,name=sda time=145

If telegraf send the LLD just before the sda metric is generated, zabbix will think that only this "sda" is present. If telegraf happens to be just restarted, and zabbix knew that disks sda and sdb exists, that LLD will make zabbix delete, or mark to delete, the items of the disk sdb.

I think the current behaviour is ok taking into account the differences between how zabbix-agent and telegraf works.

@srebhan
Copy link
Contributor

srebhan commented Jan 24, 2024

So Zabbix will delete the series (i.e. the previously collected data) if LLD omits it?

Then we do have a timing problem anyway. What if we get a rush in of metrics and they are presented as multiple batches to the output in a way that the LLD interval is between them?

@telegraf-tiger
Copy link
Contributor

@adrianlzt
Copy link
Contributor Author

Zabbix has a configuration option to decide what to do with lost data in the LLD. The default is to keep them 30 days.
image

If that value is set to 0, ther are deleted instantly.

I don't see your example. The output will collect info for 10' (default value), that should be enough to get at least one round of all metrics.
If a metric is configured with a higher interval, it could be marked to be deleted.

I think that is a corner case and it will be normally not pose a problem, as zabbix won't delete the resources inmediatly. But yes, tunning the gather interval, lld interval and keep lost resources period coud lead to items deleted and recreated.

@srebhan
Copy link
Contributor

srebhan commented Jan 24, 2024

But in this sense, sending an LLD as soon as we do see new data shouldn't be a problem, is it?
We could additionally save the state of the output consisting of the LLD cache content...

@adrianlzt
Copy link
Contributor Author

adrianlzt commented Jan 24, 2024 via email

@powersj
Copy link
Contributor

powersj commented Jan 24, 2024

@adrianlzt - thanks again for the clarifications.

Can we get you to please add a short explanation in README about this behavior? I worry that this is quite different then the experience with other plugins or other output documentation and I do not want users to get caught off guard. Something to the effect of:

Users need to keep in mind that the metrics will fail to send until the Zabbix Server has received a low-level discovery (LLD) with the metrics. Sending LLD to Zabbix is a heavy-weight process and is only done at the interval per the lld_send_interval setting.

It is possible that a user of Zabbix already knows this, but it would still be good to have this written down so we can reference it.

After that and renaming key config option I believe we are good to go.

Thanks!

@adrianlzt
Copy link
Contributor Author

Added in 335decf

@telegraf-tiger
Copy link
Contributor

Copy link
Contributor

@powersj powersj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for driving this to completion!

@powersj powersj merged commit c8e12fa into influxdata:master Jan 24, 2024
26 checks passed
@github-actions github-actions bot added this to the v1.30.0 milestone Jan 24, 2024
@adrianlzt
Copy link
Contributor Author

6 years! Time to celebrate 🥳

@Hipska Hipska mentioned this pull request Jan 24, 2024
@srebhan
Copy link
Contributor

srebhan commented Jan 24, 2024

@adrianlzt thanks for your persistence and patience!

hhiroshell pushed a commit to hhiroshell/telegraf that referenced this pull request Feb 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin new plugin plugin/output 1. Request for new output plugins 2. Issues/PRs that are related to out plugins ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants