Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SplunkHECExporter] Add ability to heartbeating destination #19220

Closed
splunkericl opened this issue Mar 3, 2023 · 11 comments
Closed

[SplunkHECExporter] Add ability to heartbeating destination #19220

splunkericl opened this issue Mar 3, 2023 · 11 comments
Labels

Comments

@splunkericl
Copy link
Contributor

Is your feature request related to a problem? Please describe.
In Edge Processor(EP), it is possible the EP of a customer would have very low traffic for some periods of time. EP adds the ability to heartbeat the destination periodically to check the validity of the destination. For example, the S3 exporter in EP periodically calls listObjects to validate the credentials are still valid.

One of the requirement to incorporate SplunkHECExporter into Edge Processor(EP) is to adds this heart-beating ability to SplunkHECExporter and adds observability data on top of it.

Describe the solution you'd like
Add an configuration option to allow heartbeating in SplunkHECExporter. If enabled the exporter would spawn a go routine that periodically sends dummy info to the destination. The dummy info can be similar to how a splunk forwarder is sending:

  • ForwarderInfo build=%s version=%s os=%s arch=%s hostname=%s guid=%s fwdType=%s ssl=true lastIndexer=%s
  • this info is sent to _internal index so it would work out of the box.

Describe alternatives you've considered
We can leverage the heartbeat information by checking how many events were sent successfully. However, this doesn't address the users that won't have data sent for a while.

@bogdandrutu bogdandrutu transferred this issue from open-telemetry/opentelemetry-collector Mar 6, 2023
@atoulme atoulme added enhancement New feature or request exporter/splunkhec labels Mar 8, 2023
@splunkericl
Copy link
Contributor Author

I am picking up this work and hoping to get some guidance. I am thinking of two approaches for the part to emit observability data for the heartbeat:

  • Add a onHeartbeatResult function in the Config for SplunkHECExporter so users can specify the action for each heartbeat. Only enable heartbeating if onHeartbeatResult is provided.
    • Pro: Users can customize their actions for heartbeating result, success or failure. Use the framework of their choice for observability data.
    • Con: function can't be specified in json or yaml file. To use heartbeating function, users must pass in the config programatically
  • Add observability inside SplunkHECExporter for heartbeating. Create a wrapper similar to how obsreport is to emit metrics either using opencensus or openTelemetry
    • Pro: every user of SplunkHECExporter can get the observability out of the box. They can enable it through json and yaml.
    • Con: I don't see a pattern of each exporter having their own observability data. Not sure if this is against the principle of otel framework.

@atoulme @dmitryax do you have any advices on the approaches?

@dmitryax
Copy link
Member

The target audience of the contrib components is the end users. Let's keep it that way end expose only user-configurable options, not programmatic. So let's add the observability data right in the exporter.

The exporter already has health_check_enabled config option. It's currently triggered only during the start time, but I believe it can be enhanced to make such periodic calls and emit additional metrics based on that. If the option is disabled, which should be by default, no additional heartbeat observability metrics are emitted.

cc @wojtekzyla who introduced that option

@atoulme
Copy link
Contributor

atoulme commented Mar 10, 2023

The health_check_enabled performs a HTTP check against /collector/event/health, and has a different function as it captures the health of the remote counterpart. Here, we want to send the status of the collector to the remote counterpart on a regular basis.

Agree on starting from the user config first.

I would recommend we look at two additional config entries under a heartbeat key with the following defaults:

  heartbeat:
    enabled: false
    interval: 30s

We generate the log message to send based off component.BuildInfo which is passed in when creating the exporter.

@dmitryax
Copy link
Member

dmitryax commented Mar 10, 2023

Here, we want to send the status of the collector to the remote counterpart on a regular basis.

Is that the case? IFAIU, the goal is still to validate the destination, according to @splunkericl, no?

EP adds the ability to heartbeat the destination periodically to check the validity of the destination

I'd like to reuse the configuration field for something serving the same purpose. It's better to avoid adding new fields if they are not strictly necessary to reduce the mental load on the end user

@splunkericl
Copy link
Contributor Author

Yeah the destination still needs to be validated. So if I understand correctly, we can:

  • Add a healthcheckInterval option in the config
    • if it is 0 or not set, the health check will happen once during start up
    • if set to anything else, it would periodically heartbeat the destination
  • Add an option to change the way to healthcheck. It is currently a GET call but we want a way to make POST call along with the token to validate. maybe a bool option validateTokenOnHealthcheck:
    • If set to false, health check would do a GET call
    • if set to true, health check would make a POST call to internal index with the token provided.

@atoulme
Copy link
Contributor

atoulme commented Mar 10, 2023

You're using interchangeably healthcheck and heartbeat. This is creating confusion. These are not the same thing.

if it is 0 or not set, the health check will happen once during start up
No, this would be a change of the behavior of the exporter. The exporter already has a health check configuration option on start.

@splunkericl
Copy link
Contributor Author

There are some similarities between healthcheck and heartbeat. But if we want to define them clearly:

  • healthcheck is the indicator whether the destination stack is up through a health endpoint
  • heartbeat is whether exporter can send dummy data using provided token to destination stack periodically, which also requires destination stack to be up.

So if it makes sense, we can put the heartbeat functionality into the healthcheck to reduce the number of functionalities end user needs to understand.

@atoulme
Copy link
Contributor

atoulme commented Mar 14, 2023

These are distinct concepts, and we'd want to toggle them on and off separately in my opinion.
If anything, we will confuse users if we conflate those.

It doesn't look like heartbeat data is dummy data to me. It looks like diagnostic data sent to upstream so that operations can check that the collector is reporting in properly, even when no data is being sent.

With that, my definition of heartbeat is:

  • heartbeat sends periodical diagnostic updates indicating the health, version and information about the environment of the collector to the endpoint so it may be collected for diagnostics.

@splunkericl
Copy link
Contributor Author

That is fair. I think this definition makes sense and we should separate the configuration between healthcheck and heartbeat then.

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label May 15, 2023
@dmitryax
Copy link
Member

Closing as done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants