Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filestream monitoring processors used to prevent error loops are wrong #2388

Closed
2 tasks
cmacknz opened this issue Mar 23, 2023 · 1 comment · Fixed by #2982
Closed
2 tasks

Filestream monitoring processors used to prevent error loops are wrong #2388

cmacknz opened this issue Mar 23, 2023 · 1 comment · Fixed by #2982
Assignees
Labels
Team:Elastic-Agent Label for the Agent team

Comments

@cmacknz
Copy link
Member

cmacknz commented Mar 23, 2023

Issue

We attempt to automatically drop error logs from the filestream monitoring instance to avoid failing to ship an event, logging it, then failing to ship the logged event, and so in an infinite failure loop.

Today these processors are defined as:

"processors": []interface{}{
// drop all events from monitoring components (do it early)
// without dropping these events the filestream gets stuck in an infinite loop
// if filestream hits an issue publishing the events it logs an error which then filestream monitor
// will read from the logs and try to also publish that new log message (thus the infinite loop)
map[string]interface{}{
"drop_event": map[string]interface{}{
"when": map[string]interface{}{
"or": []interface{}{
map[string]interface{}{
"equals": map[string]interface{}{
"component.dataset": fmt.Sprintf("elastic_agent.filestream_%s", monitoringOutput),
},
},
// for consistency this monitor is also not shipped (fetch-able with diagnostics)
map[string]interface{}{
"equals": map[string]interface{}{
"component.dataset": fmt.Sprintf("elastic_agent.beats_metrics_%s", monitoringOutput),
},
},
// for consistency with this monitor is also not shipped (fetch-able with diagnostics)
map[string]interface{}{
"equals": map[string]interface{}{
"component.dataset": fmt.Sprintf("elastic_agent.http_metrics_%s", monitoringOutput),
},
},
},
},
},
},

This results in the following in the beat-rendered-config of the filestream-monitoring input:

  - drop_event:
      when:
        or:
        - equals:
            component:
              dataset: elastic_agent.filestream_monitoring
        - equals:
            component:
              dataset: elastic_agent.beats_metrics_monitoring
        - equals:
            component:
              dataset: elastic_agent.http_metrics_monitoring

The problem is that the datasets are wrong. Looking at sample events the dataset identifier is actually:

{
  "component": {
    "binary": "filebeat",
    "dataset": "elastic_agent.filebeat",
    "id": "filestream-monitoring",
    "type": "filestream"
  }
}

Likely this was broken when we moved away from per process data stream names and reverted to the names in the 8.5.x and earlier released. See #1814

Definition of done

  • Filter on the correct dataset
  • Test that we are correctly dropping the expected events and not other events
@cmacknz cmacknz added the Team:Elastic-Agent Label for the Agent team label Mar 23, 2023
@pierrehilbert pierrehilbert assigned belimawr and unassigned rdner Jun 21, 2023
@belimawr
Copy link
Contributor

belimawr commented Jul 3, 2023

@cmacknz I have a draft PR (tests are still missing): #2982

I had to filter by component.id because the dataset does not contain information about whether component is a monitoring component.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants