[Azure] Add input metrics to the azure-eventhub input #35739

zmoog · 2023-06-11T11:15:23Z

What does this PR do?

Add input metrics to keep track of the following data categories:

messages
events
general information (processing time and decode errors)

Messages vs. Events

Messages are the raw data received from the event hub. Here's an example of a
message:

{
  "records": [
    {
      "time": "2019-12-17T13:43:44.4946995Z",
      "test": "this is some message"
    }
  ]
}

Events are the objects inside the records array. Here's an example of an event
from the above message:

{
  "time": "2019-12-17T13:43:44.4946995Z",
  "test": "this is some message"
}

Events contained in a message's records key are the actual logs from Azure; the input creates one document in Elasticsearch for each event.

This draft keeps track of the following conditions.

Messages

received_messages_total: the number of messages received from the event hub.
received_bytes_total: the number of bytes received from the event hub.
sanitized_messages_total: the number of messages sanitized successfully (some Azure services send malformed JSON documents).
processed_messages_total: the number of messages that were processed successfully.

Events:

received_events_total: tracks the number of events received decoding messages.
sent_events_total: the number of events sent successfully to the outlet.

General:

processing_time: the time it takes to process a message.
decode_errors_total: tracks the number of errors that occurred while decoding a message.

On generating the input ID

Since the azure-eventhub is an input v1, we hashed the configuration structure to generate the unique ID needed by the registry. This will be removed as we migrate this input from v1 to v2 input API.

Why is it important?

Input metrics are a great tool to inspect the input state for troubleshooting and support purposes.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have made corresponding change to the default configuration files
I have added tests that prove my fix is effective or that my feature works
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

[ ]

How to test this PR locally

Related issues

Related https://github.com/elastic/security-team/issues/6389

Use cases

Screenshots

Logs

mergify · 2023-06-11T11:15:58Z

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @zmoog? 🙏.
For such, you'll need to label your PR with:

The upcoming major version of the Elastic Stack
The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

elasticmachine · 2023-06-11T12:12:17Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2023-08-15T07:14:34.097+0000
Duration: 76 min 17 sec

Test stats 🧪

Test	Results
Failed	0
Passed	3132
Skipped	176
Total	3308

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.
/package : Generate the packages and run the E2E tests.
/beats-tester : Run the installation tests with beats-tester.
run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

x-pack/filebeat/input/azureeventhub/input.go

x-pack/filebeat/input/azureeventhub/metrics.go

x-pack/filebeat/input/azureeventhub/input.go

andrewkroh

Can you please update the asciidoc file for this input to have a table similar to the other inputs that now expose metrics.

andrewkroh · 2023-06-20T18:55:07Z

x-pack/filebeat/input/azureeventhub/input.go

 				"azure":   azure,
 			},
 			Private: event.Data,
 		})
 		if !ok {
+			// a.metrics.publishingErrors.Inc()


IIRC this only every returns false (e.g. !ok) upon shutdown to indicate that it's stopping. No other error states are conveyed this way. So I think you are correct here. We should just remove the commented line.

I think as a future improvement to the input metrics we should find a way to expose metrics from the pipeline client(s) associated to the input. This way we can associate the input with counters for drops and duplicates.

I think as a future improvement to the input metrics we should find a way to expose metrics from the pipeline client(s) associated to the input. This way we can associate the input with counters for drops and duplicates.

Yep, I'll look into this in one of the next iterations. I have plans to migrate to input v2 and the new Event Hub SDK.

zmoog · 2023-06-20T20:38:38Z

Can you please update the asciidoc file for this input to have a table similar to the other inputs that now expose metrics.

Great, I missed this section; thanks for the heads up.

mergify · 2023-08-14T19:10:01Z

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b zmoog/azure-eventhub-input-metrics upstream/zmoog/azure-eventhub-input-metrics
git merge upstream/main
git push upstream zmoog/azure-eventhub-input-metrics

In this first iteration, the metrics keep track of the following data types: - events - records Events are the event delivered by the event hub; each event usually contains a list of records. Records are the actual logs from Azure; the input creates one document in Elasticsearch for each record. This draft keeps track of the following conditions. Events: - received: the input received from an event from the Event Hub - sanitized: the event contains invalid JSON, and the input tried fixing it - deserialization failed: the event contains invalid JSON; sanitization was ineffective - processed: the input processed all the records in the event Records: - received: the input unpacked a record from an event - serialization failed: failed to serialize the record for dispatching - processed: the input dispatched the record to the outlet

On v1 inputs, using the hash function of the config is a common practice.

This way it's more evident the two go along together. I am also adding comments for underline the ID will go away as we migrate the input to the input V2 API.

Adds a few more log message during major lifecycle events like failures in event publishing and input stopping.

girodav

Great work! :)

girodav · 2023-08-15T07:25:56Z

x-pack/filebeat/docs/inputs/input-azure-eventhub.asciidoc

+| Metric                       | Description
+| `received_messages_total`    | Number of messages received from the event hub.
+| `received_bytes_total`       | Number of bytes received from the event hub.
+| `sanitized_messages_total`   | Number of messages that were sanitized successfully.


NIT: I wonder if we should explain what "sanitized" refers to here. I don't think that the difference between "processed" and "sanitized" is straightforward for a user without context.

* Draft input metrics about events and records In this first iteration, the metrics keep track of the following data types: - events - records Events are the event delivered by the event hub; each event usually contains a list of records. Records are the actual logs from Azure; the input creates one document in Elasticsearch for each record. This draft keeps track of the following conditions. Events: - received: the input received from an event from the Event Hub - sanitized: the event contains invalid JSON, and the input tried fixing it - deserialization failed: the event contains invalid JSON; sanitization was ineffective - processed: the input processed all the records in the event Records: - received: the input unpacked a record from an event - serialization failed: failed to serialize the record for dispatching - processed: the input dispatched the record to the outlet * Add test cases for input metrics * Fix linter objections * Fix typos and stale comments * Set input ID with the config hash On v1 inputs, using the hash function of the config is a common practice. * Simplify and adopt conventions from other inputs * Add input metrics docs * Remove redundant log messages and leftovers * Move input ID and metrics closer This way it's more evident the two go along together. I am also adding comments for underline the ID will go away as we migrate the input to the input V2 API. * Add a guard to unregister() * Log major lifecycle events Adds a few more log message during major lifecycle events like failures in event publishing and input stopping. * Update CHANGELOG --------- Co-authored-by: Davide Girardi <1390902+girodav@users.noreply.github.com>

zmoog added enhancement Team:Cloud-Monitoring Label for the Cloud Monitoring team labels Jun 11, 2023

zmoog self-assigned this Jun 11, 2023

botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Jun 11, 2023

zmoog marked this pull request as ready for review June 19, 2023 15:20

zmoog requested a review from a team as a code owner June 19, 2023 15:20

zmoog requested review from andrewkroh, P1llus and efd6 June 19, 2023 15:41

efd6 reviewed Jun 19, 2023

View reviewed changes

andrewkroh reviewed Jun 20, 2023

View reviewed changes

zmoog force-pushed the zmoog/azure-eventhub-input-metrics branch from d64995e to eae6f2f Compare June 20, 2023 20:41

andrewkroh approved these changes Jun 21, 2023

View reviewed changes

zmoog force-pushed the zmoog/azure-eventhub-input-metrics branch 3 times, most recently from 9180163 to cfad760 Compare August 14, 2023 15:44

zmoog added 9 commits August 14, 2023 22:20

Add test cases for input metrics

1e149b5

Fix linter objections

3508c13

Fix typos and stale comments

46792ef

Set input ID with the config hash

a7d7dc3

On v1 inputs, using the hash function of the config is a common practice.

Simplify and adopt conventions from other inputs

6be69dd

Add input metrics docs

b612d6d

Remove redundant log messages and leftovers

7be9357

Move input ID and metrics closer

754337a

This way it's more evident the two go along together. I am also adding comments for underline the ID will go away as we migrate the input to the input V2 API.

zmoog added 3 commits August 14, 2023 22:20

Add a guard to unregister()

1649f79

Log major lifecycle events

9d48263

Adds a few more log message during major lifecycle events like failures in event publishing and input stopping.

Update CHANGELOG

c934192

zmoog force-pushed the zmoog/azure-eventhub-input-metrics branch from cfad760 to c934192 Compare August 14, 2023 20:22

Merge branch 'main' into zmoog/azure-eventhub-input-metrics

cbc4fde

girodav approved these changes Aug 15, 2023

View reviewed changes

girodav enabled auto-merge (squash) August 15, 2023 08:24

girodav merged commit 7f5a1ec into elastic:main Aug 15, 2023
21 checks passed

zmoog deleted the zmoog/azure-eventhub-input-metrics branch August 17, 2023 08:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Azure] Add input metrics to the azure-eventhub input #35739

[Azure] Add input metrics to the azure-eventhub input #35739

zmoog commented Jun 11, 2023 •

edited

mergify bot commented Jun 11, 2023

elasticmachine commented Jun 11, 2023 •

edited by jenkins-beats-ci bot

Build stats

Test stats 🧪

andrewkroh left a comment

andrewkroh Jun 20, 2023

zmoog Jun 20, 2023

zmoog commented Jun 20, 2023

mergify bot commented Aug 14, 2023

girodav left a comment

girodav Aug 15, 2023

[Azure] Add input metrics to the azure-eventhub input #35739

[Azure] Add input metrics to the azure-eventhub input #35739

Conversation

zmoog commented Jun 11, 2023 • edited

What does this PR do?

Messages vs. Events

Messages

On generating the input ID

Why is it important?

Checklist

Author's Checklist

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

mergify bot commented Jun 11, 2023

elasticmachine commented Jun 11, 2023 • edited by jenkins-beats-ci bot

💚 Build Succeeded

Build stats

Test stats 🧪

💚 Flaky test report

🤖 GitHub comments

andrewkroh left a comment

Choose a reason for hiding this comment

andrewkroh Jun 20, 2023

Choose a reason for hiding this comment

zmoog Jun 20, 2023

Choose a reason for hiding this comment

zmoog commented Jun 20, 2023

mergify bot commented Aug 14, 2023

girodav left a comment

Choose a reason for hiding this comment

girodav Aug 15, 2023

Choose a reason for hiding this comment

zmoog commented Jun 11, 2023 •

edited

elasticmachine commented Jun 11, 2023 •

edited by jenkins-beats-ci bot