Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Azure] Add input metrics to the azure-eventhub input #35739

Merged
merged 13 commits into from Aug 15, 2023

Conversation

zmoog
Copy link
Contributor

@zmoog zmoog commented Jun 11, 2023

What does this PR do?

Add input metrics to keep track of the following data categories:

  • messages
  • events
  • general information (processing time and decode errors)

Messages vs. Events

Messages are the raw data received from the event hub. Here's an example of a
message:

{
  "records": [
    {
      "time": "2019-12-17T13:43:44.4946995Z",
      "test": "this is some message"
    }
  ]
}

Events are the objects inside the records array. Here's an example of an event
from the above message:

{
  "time": "2019-12-17T13:43:44.4946995Z",
  "test": "this is some message"
}

Events contained in a message's records key are the actual logs from Azure; the input creates one document in Elasticsearch for each event.

This draft keeps track of the following conditions.

Messages

  • received_messages_total: the number of messages received from the event hub.
  • received_bytes_total: the number of bytes received from the event hub.
  • sanitized_messages_total: the number of messages sanitized successfully (some Azure services send malformed JSON documents).
  • processed_messages_total: the number of messages that were processed successfully.

Events:

  • received_events_total: tracks the number of events received decoding messages.
  • sent_events_total: the number of events sent successfully to the outlet.

General:

  • processing_time: the time it takes to process a message.
  • decode_errors_total: tracks the number of errors that occurred while decoding a message.

On generating the input ID

Since the azure-eventhub is an input v1, we hashed the configuration structure to generate the unique ID needed by the registry. This will be removed as we migrate this input from v1 to v2 input API.

Why is it important?

Input metrics are a great tool to inspect the input state for troubleshooting and support purposes.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

@zmoog zmoog added enhancement Team:Cloud-Monitoring Label for the Cloud Monitoring team labels Jun 11, 2023
@zmoog zmoog self-assigned this Jun 11, 2023
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Jun 11, 2023
@mergify
Copy link
Contributor

mergify bot commented Jun 11, 2023

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @zmoog? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@elasticmachine
Copy link
Collaborator

elasticmachine commented Jun 11, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-08-15T07:14:34.097+0000

  • Duration: 76 min 17 sec

Test stats 🧪

Test Results
Failed 0
Passed 3132
Skipped 176
Total 3308

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@zmoog zmoog marked this pull request as ready for review June 19, 2023 15:20
@zmoog zmoog requested a review from a team as a code owner June 19, 2023 15:20
@zmoog zmoog requested review from andrewkroh, P1llus and efd6 June 19, 2023 15:41
x-pack/filebeat/input/azureeventhub/input.go Show resolved Hide resolved
x-pack/filebeat/input/azureeventhub/metrics.go Outdated Show resolved Hide resolved
x-pack/filebeat/input/azureeventhub/input.go Outdated Show resolved Hide resolved
x-pack/filebeat/input/azureeventhub/input.go Outdated Show resolved Hide resolved
x-pack/filebeat/input/azureeventhub/input.go Outdated Show resolved Hide resolved
x-pack/filebeat/input/azureeventhub/input.go Outdated Show resolved Hide resolved
x-pack/filebeat/input/azureeventhub/input.go Outdated Show resolved Hide resolved
Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please update the asciidoc file for this input to have a table similar to the other inputs that now expose metrics.

"azure": azure,
},
Private: event.Data,
})
if !ok {
// a.metrics.publishingErrors.Inc()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC this only every returns false (e.g. !ok) upon shutdown to indicate that it's stopping. No other error states are conveyed this way. So I think you are correct here. We should just remove the commented line.

I think as a future improvement to the input metrics we should find a way to expose metrics from the pipeline client(s) associated to the input. This way we can associate the input with counters for drops and duplicates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think as a future improvement to the input metrics we should find a way to expose metrics from the pipeline client(s) associated to the input. This way we can associate the input with counters for drops and duplicates.

Yep, I'll look into this in one of the next iterations. I have plans to migrate to input v2 and the new Event Hub SDK.

@zmoog
Copy link
Contributor Author

zmoog commented Jun 20, 2023

Can you please update the asciidoc file for this input to have a table similar to the other inputs that now expose metrics.

Great, I missed this section; thanks for the heads up.

@zmoog zmoog force-pushed the zmoog/azure-eventhub-input-metrics branch from d64995e to eae6f2f Compare June 20, 2023 20:41
@zmoog zmoog force-pushed the zmoog/azure-eventhub-input-metrics branch 3 times, most recently from 9180163 to cfad760 Compare August 14, 2023 15:44
@mergify
Copy link
Contributor

mergify bot commented Aug 14, 2023

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b zmoog/azure-eventhub-input-metrics upstream/zmoog/azure-eventhub-input-metrics
git merge upstream/main
git push upstream zmoog/azure-eventhub-input-metrics

In this first iteration, the metrics keep track of the following data
types:

- events
- records

Events are the event delivered by the event hub; each event usually
contains a list of records.

Records are the actual logs from Azure; the input creates one document
in Elasticsearch for each record.

This draft keeps track of the following conditions.

Events:

- received: the input received from an event from the Event Hub
- sanitized: the event contains invalid JSON, and the input tried fixing it
- deserialization failed: the event contains invalid JSON; sanitization was ineffective
- processed: the input processed all the records in the event

Records:

- received: the input  unpacked a record from an event
- serialization failed: failed to serialize the record for dispatching
- processed: the input dispatched the record to the outlet
On v1 inputs, using the hash function of the config is a common
practice.
This way it's more evident the two go along together. I am also adding
comments for underline the ID will go away as we migrate the input
to the input V2 API.
Adds a few more log message during major lifecycle events like
failures in event publishing and input stopping.
@zmoog zmoog force-pushed the zmoog/azure-eventhub-input-metrics branch from cfad760 to c934192 Compare August 14, 2023 20:22
Copy link
Contributor

@girodav girodav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! :)

| Metric | Description
| `received_messages_total` | Number of messages received from the event hub.
| `received_bytes_total` | Number of bytes received from the event hub.
| `sanitized_messages_total` | Number of messages that were sanitized successfully.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: I wonder if we should explain what "sanitized" refers to here. I don't think that the difference between "processed" and "sanitized" is straightforward for a user without context.

@girodav girodav enabled auto-merge (squash) August 15, 2023 08:24
@girodav girodav merged commit 7f5a1ec into elastic:main Aug 15, 2023
21 checks passed
@zmoog zmoog deleted the zmoog/azure-eventhub-input-metrics branch August 17, 2023 08:01
Scholar-Li pushed a commit to Scholar-Li/beats that referenced this pull request Feb 5, 2024
* Draft input metrics about events and records

In this first iteration, the metrics keep track of the following data
types:

- events
- records

Events are the event delivered by the event hub; each event usually
contains a list of records.

Records are the actual logs from Azure; the input creates one document
in Elasticsearch for each record.

This draft keeps track of the following conditions.

Events:

- received: the input received from an event from the Event Hub
- sanitized: the event contains invalid JSON, and the input tried fixing it
- deserialization failed: the event contains invalid JSON; sanitization was ineffective
- processed: the input processed all the records in the event

Records:

- received: the input  unpacked a record from an event
- serialization failed: failed to serialize the record for dispatching
- processed: the input dispatched the record to the outlet

* Add test cases for input metrics

* Fix linter objections

* Fix typos and stale comments

* Set input ID with the config hash

On v1 inputs, using the hash function of the config is a common
practice.

* Simplify and adopt conventions from other inputs

* Add input metrics docs

* Remove redundant log messages and leftovers

* Move input ID and metrics closer

This way it's more evident the two go along together. I am also adding
comments for underline the ID will go away as we migrate the input
to the input V2 API.

* Add a guard to unregister()

* Log major lifecycle events

Adds a few more log message during major lifecycle events like
failures in event publishing and input stopping.

* Update CHANGELOG

---------

Co-authored-by: Davide Girardi <1390902+girodav@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Team:Cloud-Monitoring Label for the Cloud Monitoring team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants