Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Plugin state-persistence #12166

Merged
merged 15 commits into from
Mar 1, 2023
Merged

Conversation

srebhan
Copy link
Contributor

@srebhan srebhan commented Nov 3, 2022

resolves #8352
replaces #9476

This PR adds the machinery to persist the "state" of a plugin over multiple Telegraf runs. A plugin's state is defined by the plugin itself and only has the requirement of being serializable to JSON.

The framework defines an interface that plugins have to adhere to for providing a state (GetState()) and restore a state on load (SetState()). It furthermore implements a persister to collect and save the states to disk on Telegraf shutdown and to load and distribute the states on Telegraf startup.
At the code of the machinery, the framework will generate a unique ID for each configured plugin instance to relate the saved states to the corresponding plugin instance. The ID generation is based on the TOML configuration of each plugin instances and is consistent over restarts of Telegraf as long as the plugin configuration is unchanged. At this, the ID generation is invariant to order changes or reorganization among config-files as long as no options are added, removed or modified.

@telegraf-tiger telegraf-tiger bot added the feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin label Nov 3, 2022
@srebhan srebhan added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Nov 25, 2022
Copy link
Contributor

@powersj powersj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome. Played with this using the tail plugin and successfully saw it re-start and only ingest the new lines. As usual the code looks great, my concerns are about how to interact with the feature:

  1. What is the expected behavior when the state file does not exist? I see an info (not error) log message telling me the file does not exist, but telegraf stops executing.

  2. It appears it is up to the user to create an empty JSON file to store state in initially? This seems unnecessary and not straightforward.

  3. If the config file changes, telegraf does not tell the user that the ids changed compared to what already existed and the state will reset. It think we need to be more noisy about this.

  4. I am wondering if there is a change and the state file will need to be updated, should we keep the old state file around to let a user revert?

  5. Does the persister module need specific unit tests. I see it is used throughout config.go, but wondering if something is needed there.

  6. Need a doc about this feature in two places: a) for developers about what needs to happen when adding this to a plugin and b) to users in general docs

@powersj powersj removed the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Jan 25, 2023
@powersj powersj assigned srebhan and unassigned powersj Feb 15, 2023
@srebhan
Copy link
Contributor Author

srebhan commented Feb 23, 2023

Thanks for the review and comments @powersj. Let me answer your points...

  1. What is the expected behavior when the state file does not exist? I see an info (not error) log message telling me the file does not exist, but telegraf stops executing.
  2. It appears it is up to the user to create an empty JSON file to store state in initially? This seems unnecessary and not straightforward.

This was an unintended behavior and should be fixed now. If no state-file exists it will be created by Telegraf on exit so the user has to do nothing.

  1. If the config file changes, telegraf does not tell the user that the ids changed compared to what already existed and the state will reset. It think we need to be more noisy about this.

I'm not sure how to do this. I can report the IDs of the plugins that differ (i.e. that are in the state-file but not the current config and vice versa) but does that make sense to the user? Should I store additional information for this report? If so, what do you envision? The complete plugin config?

  1. I am wondering if there is a change and the state file will need to be updated, should we keep the old state file around to let a user revert?

No. Last time we discussed persistence the consensus was "as simple as possible". If the user wants versioning, he can still wrap telegraf with a script copying the latest state to a telegraf-$(date --iso-8601=seconds).state file before starting telegraf.

  1. Does the persister module need specific unit tests. I see it is used throughout config.go, but wondering if something is needed there.

I don't think so. The config tests completely cover the persister, I don't know how a good non-integrated unit-test should look like.

  1. Need a doc about this feature in two places: a) for developers about what needs to happen when adding this to a plugin and b) to users in general docs

6b) is already in the PR (see change in docs/CONFIGURATION.md) or do you need something else? For 6a), where should I put that documentation?

agent/agent.go Outdated Show resolved Hide resolved
@powersj
Copy link
Contributor

powersj commented Feb 23, 2023

This was an unintended behavior and should be fixed now. If no state-file exists it will be created by Telegraf on exit so the user has to do nothing.

Awesome thank you!

I'm not sure how to do this. I can report the IDs of the plugins that differ (i.e. that are in the state-file but not the current config and vice versa) but does that make sense to the user? Should I store additional information for this report? If so, what do you envision? The complete plugin config?

I need to remind myself of how we review files, but off the top of my head is there a way we can keep the sha256 sum of the config file? and report if it changed?

  1. I am wondering if there is a change and the state file will need to be updated, should we keep the old state file around to let a user revert?

No. Last time we discussed persistence the consensus was "as simple as possible". If the user wants versioning, he can still wrap telegraf with a script copying the latest state to a telegraf-$(date --iso-8601=seconds).state file before starting telegraf.

ok

  1. Need a doc about this feature in two places: a) for developers about what needs to happen when adding this to a plugin and b) to users in general docs

6b) is already in the PR (see change in docs/CONFIGURATION.md) or do you need something else? For 6a), where should I put that documentation?

6a should go into docs/developers maybe a STATE-PERSISTENCE.md?

Thanks again!

@srebhan
Copy link
Contributor Author

srebhan commented Feb 23, 2023

I need to remind myself of how we review files, but off the top of my head is there a way we can keep the sha256 sum of the config file? and report if it changed?

Basically I keep the ID of the plugin. I could annotate this with the plugin name (like inputs.tail), but if you do have multiple instances of that plugin it probably won't help.

6a should go into docs/developers maybe a STATE-PERSISTENCE.md?

Sounds good. Will do.

@powersj
Copy link
Contributor

powersj commented Feb 23, 2023

Basically I keep the ID of the plugin. I could annotate this with the plugin name (like inputs.tail), but if you do have multiple instances of that plugin it probably won't help.

ok thanks for reminding me why this is more complex :) Let's drop this for now.

@telegraf-tiger
Copy link
Contributor

@srebhan srebhan added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Mar 1, 2023
@srebhan srebhan assigned powersj and unassigned srebhan Mar 1, 2023
Copy link
Contributor

@powersj powersj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this! I think we will want to a) write up a blog post and b) include a highlight of this in the release notes and make it clear this is a new/early feature.

Thanks again for the work!

@powersj powersj merged commit f87916a into influxdata:master Mar 1, 2023
@jhychan
Copy link
Contributor

jhychan commented Mar 1, 2023

This is awesome, very much looking forward to this!

@knollet
Copy link
Contributor

knollet commented May 8, 2023

Hi, Feature-Request:
could you please create /var/lib/telegraf in the distribution packages for this?
Then there could be
# statefile = "/var/lib/telegraf/telegraf.state"
commented out in agent.conf

That would be the correct location to put this sort of file by default.
It could even be commented in by default, because... why not?

@srebhan
Copy link
Contributor Author

srebhan commented May 19, 2023

@knollet please open an issue for this and do not hide it in an "old" PR!

@srebhan srebhan added this to the v1.26.0 milestone Jun 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add persistent state feature to Telegraf plugins (bookmarking)
5 participants