Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(http): Stop plugins from leaking file descriptors on telegraf reload #15213

Merged
merged 1 commit into from
Apr 24, 2024

Conversation

nick-kentik
Copy link
Contributor

Summary

On reload, telegraf rebuilds its plugins, which means creating a new *http.Client (and *http.Transport) in each of the files touched in this PR. In general, transports should be long-lived; they carry state and possibly idle HTTP connections.

Without explicitly calling CloseIdleConnections(), in cases where no idle_conn_timeout is set, we leave it up to the go runtime to decide when to drop the connections - and it can easily decide "never", leading to open connections accumulating over time, especially with repeated reloads of the telegraf config in a single process, and eventually telegraf failing with EMFILE problems.

Fix this by telling the Go runtime (by calling CloseIdleConnections()) when we're done with a given transport.

Checklist

  • No AI generated code was used in this PR

Related issues

resolves #15177

@nick-kentik nick-kentik changed the title plugins: Fix file descriptor leaks on telegraf reload fix: Stop plugins from leaking file descriptors on telegraf reload Apr 23, 2024
@telegraf-tiger telegraf-tiger bot added the fix pr to fix corresponding bug label Apr 23, 2024
@nick-kentik
Copy link
Contributor Author

Honestly, I don't like this PR very much, but it does solve the fd leaks.

I feel like the common HTTP client could be reworked so the teardown step is umissable - a few ways to do that. It would also be better if I didn't need all the dummy Start() plugin methods. I wanted to steer clear of big refactorings though, especially those touching telegraf core.

Copy link
Contributor

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your fix @nick-kentik! Some comments in the code... I generally like the changes in the code (even though you seem to dislike them ;-)). We could think of adding a idle-timeout setting to the common HTTP client and use it (or disable it) where applicable...

plugins/inputs/elasticsearch/elasticsearch.go Show resolved Hide resolved
plugins/inputs/elasticsearch_query/elasticsearch_query.go Outdated Show resolved Hide resolved
plugins/secretstores/http/http.go Outdated Show resolved Hide resolved
@srebhan srebhan changed the title fix: Stop plugins from leaking file descriptors on telegraf reload fix(http): Stop plugins from leaking file descriptors on telegraf reload Apr 23, 2024
@srebhan srebhan self-assigned this Apr 23, 2024
@srebhan srebhan added plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins plugin/output 1. Request for new output plugins 2. Issues/PRs that are related to out plugins plugin/secretstores labels Apr 23, 2024
@nick-kentik
Copy link
Contributor Author

Thanks @srebhan , suggestions applied.

I don't particularly like this solution, but I like the file descriptor leaks much less 😅. Plenty of time to improve things once the bugs are squashed.

@telegraf-tiger
Copy link
Contributor

Copy link
Contributor

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nick-kentik! Feel free to submit with a follow up PR to add a config option with a reasonable (long) default for the idle-timeout!

@srebhan srebhan added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Apr 23, 2024
@srebhan srebhan assigned powersj and DStrand1 and unassigned srebhan Apr 23, 2024
Copy link
Contributor

@powersj powersj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. We had internally talked about this yesterday and agreed that this was the way forward as well. So the timing of your PR was perfect.

@powersj powersj removed their assignment Apr 23, 2024
@DStrand1 DStrand1 merged commit 96d6da6 into influxdata:master Apr 24, 2024
27 checks passed
@github-actions github-actions bot added this to the v1.30.3 milestone Apr 24, 2024
powersj pushed a commit that referenced this pull request May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix pr to fix corresponding bug plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins plugin/output 1. Request for new output plugins 2. Issues/PRs that are related to out plugins plugin/secretstores ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

File descriptor / HTTP client connection leak on telegraf reload
4 participants