New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Azure] Add input metrics to the azure-eventhub input #35739
[Azure] Add input metrics to the azure-eventhub input #35739
Conversation
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please update the asciidoc file for this input to have a table similar to the other inputs that now expose metrics.
"azure": azure, | ||
}, | ||
Private: event.Data, | ||
}) | ||
if !ok { | ||
// a.metrics.publishingErrors.Inc() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC this only every returns false (e.g. !ok
) upon shutdown to indicate that it's stopping. No other error states are conveyed this way. So I think you are correct here. We should just remove the commented line.
I think as a future improvement to the input metrics we should find a way to expose metrics from the pipeline client(s) associated to the input. This way we can associate the input with counters for drops and duplicates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think as a future improvement to the input metrics we should find a way to expose metrics from the pipeline client(s) associated to the input. This way we can associate the input with counters for drops and duplicates.
Yep, I'll look into this in one of the next iterations. I have plans to migrate to input v2 and the new Event Hub SDK.
Great, I missed this section; thanks for the heads up. |
d64995e
to
eae6f2f
Compare
9180163
to
cfad760
Compare
This pull request is now in conflicts. Could you fix it? 🙏
|
In this first iteration, the metrics keep track of the following data types: - events - records Events are the event delivered by the event hub; each event usually contains a list of records. Records are the actual logs from Azure; the input creates one document in Elasticsearch for each record. This draft keeps track of the following conditions. Events: - received: the input received from an event from the Event Hub - sanitized: the event contains invalid JSON, and the input tried fixing it - deserialization failed: the event contains invalid JSON; sanitization was ineffective - processed: the input processed all the records in the event Records: - received: the input unpacked a record from an event - serialization failed: failed to serialize the record for dispatching - processed: the input dispatched the record to the outlet
On v1 inputs, using the hash function of the config is a common practice.
This way it's more evident the two go along together. I am also adding comments for underline the ID will go away as we migrate the input to the input V2 API.
Adds a few more log message during major lifecycle events like failures in event publishing and input stopping.
cfad760
to
c934192
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! :)
| Metric | Description | ||
| `received_messages_total` | Number of messages received from the event hub. | ||
| `received_bytes_total` | Number of bytes received from the event hub. | ||
| `sanitized_messages_total` | Number of messages that were sanitized successfully. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: I wonder if we should explain what "sanitized" refers to here. I don't think that the difference between "processed" and "sanitized" is straightforward for a user without context.
* Draft input metrics about events and records In this first iteration, the metrics keep track of the following data types: - events - records Events are the event delivered by the event hub; each event usually contains a list of records. Records are the actual logs from Azure; the input creates one document in Elasticsearch for each record. This draft keeps track of the following conditions. Events: - received: the input received from an event from the Event Hub - sanitized: the event contains invalid JSON, and the input tried fixing it - deserialization failed: the event contains invalid JSON; sanitization was ineffective - processed: the input processed all the records in the event Records: - received: the input unpacked a record from an event - serialization failed: failed to serialize the record for dispatching - processed: the input dispatched the record to the outlet * Add test cases for input metrics * Fix linter objections * Fix typos and stale comments * Set input ID with the config hash On v1 inputs, using the hash function of the config is a common practice. * Simplify and adopt conventions from other inputs * Add input metrics docs * Remove redundant log messages and leftovers * Move input ID and metrics closer This way it's more evident the two go along together. I am also adding comments for underline the ID will go away as we migrate the input to the input V2 API. * Add a guard to unregister() * Log major lifecycle events Adds a few more log message during major lifecycle events like failures in event publishing and input stopping. * Update CHANGELOG --------- Co-authored-by: Davide Girardi <1390902+girodav@users.noreply.github.com>
What does this PR do?
Add input metrics to keep track of the following data categories:
Messages vs. Events
Messages are the raw data received from the event hub. Here's an example of a
message:
Events are the objects inside the
records
array. Here's an example of an eventfrom the above message:
Events contained in a message's
records
key are the actual logs from Azure; the input creates one document in Elasticsearch for each event.This draft keeps track of the following conditions.
Messages
received_messages_total
: the number of messages received from the event hub.received_bytes_total
: the number of bytes received from the event hub.sanitized_messages_total
: the number of messages sanitized successfully (some Azure services send malformed JSON documents).processed_messages_total
: the number of messages that were processed successfully.Events:
received_events_total
: tracks the number of events received decoding messages.sent_events_total
: the number of events sent successfully to the outlet.General:
processing_time
: the time it takes to process a message.decode_errors_total
: tracks the number of errors that occurred while decoding a message.On generating the input ID
Since the azure-eventhub is an input v1, we hashed the configuration structure to generate the unique ID needed by the registry. This will be removed as we migrate this input from v1 to v2 input API.
Why is it important?
Input metrics are a great tool to inspect the input state for troubleshooting and support purposes.
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Author's Checklist
How to test this PR locally
Related issues
Use cases
Screenshots
Logs