Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not set require_alias when sending to indices even if ILM is enabled #30055

Closed
wants to merge 4 commits into from

Conversation

kvch
Copy link
Contributor

@kvch kvch commented Jan 27, 2022

What does this PR do?

Previously, if ILM was enabled in Metricbeat, require_alias was set for all Bulk API requests. However, when xpack.enabled is set to true in beat, logstash, kibana or elasticsearch modules, they overwrite the target indices of the events. So Metricbeat can send the events to the monitoring indices. The monitoring indices are regular indices, not aliases. Hence, we got an error for Bulk API requests as they were not aliases.

Why is it important?

Without this fix, Metricbeat cannot send monitoring events to Elasticsearch.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
    - [ ] I have made corresponding changes to the documentation
    - [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
    - [ ] I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

Enable Beat module in Metricbeat and set the events to the monitoring index:

 metricbeat.modules:
 - module: beat
   metricsets:
     - stats
     - state
   period: 10s
   hosts: ["http://localhost:5066"]
   xpack.enabled: true

 output.elasticsearch:
   hosts: ["localhost:9200"]

Start Filebeat with HTTP endpoint enabled:

http.enabled: true

Start Metricbeat and check if the events can be indexed.

Related issues

Closes #29920
Closes #30044

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jan 27, 2022
@kvch kvch added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Jan 27, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jan 27, 2022
@kvch kvch changed the title Fix libbeat bulk for monitoring Do not set require_alias when sending to indices even if ILM is enabled Jan 27, 2022
@elasticmachine
Copy link
Collaborator

elasticmachine commented Jan 27, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-01-27T11:28:24.405+0000

  • Duration: 119 min 24 sec

  • Commit: 50d6a7e

Test stats 🧪

Test Results
Failed 0
Passed 55239
Skipped 5322
Total 60561

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

Copy link
Contributor

@sayden sayden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks Noemi!

Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is for 7.17. For stack monitoring, why is 8.0 not also affected?

if v, _ := events.GetMetaStringValue(*event, events.FieldMetaAlias); v != "" {
return true
}
if v, _ := events.GetMetaStringValue(*event, events.FieldMetaIndex); v != "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what all the places are where we us this. I thought we also use this in places like k8s do adjust / overwrite the automatically created index name. In these scenarios this would not be correct.

An alternative approach is to introduce events.MetaInternalIndex or similar that is purely set by the stack monitoring modules so it should not have any side effects.

One more idea, we hardcode that if the index name matches .monitoring-* we don't do the check.

I'm worried that if we are not very specific with the check, it will have unexpected side effects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what all the places are where we us this. I thought we also use this in places like k8s do adjust / overwrite the automatically created index name. In these scenarios this would not be correct.

I am not sure I understand why it would be an issue. If anything overwrites the index or raw_index, we do not set require_alias, so there is no problem during ingestion. If something sets alias, the target is an alias so require_alias should be safe.

One more idea, we hardcode that if the index name matches .monitoring-* we don't do the check.

I also considered it, but I discarded the idea because I do not want to write monitoring specific code in a general elasticsearch client code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand why it would be an issue. If anything overwrites the index or raw_index, we do not set require_alias, so there is no problem during ingestion. If something sets alias, the target is an alias so require_alias should be safe.

The initial change was introduced to fix an issue when an index is deleted. Now lets assume for a moment (alias partially made up) the metricset kubernetes.foo_bar sets the alias name to kubernetes.foo.bar specifically. This means the alias feature is disabled. Now the user wipes the alias with all indices and we start to ingest data into kuberentes.foo.bar. If we see a support request for this, our first answer is this is not possible. But turns out, it is in this case. What you have will fix most cases but it creates incosistency. Lets rather revert the previous change and discuss on how we should deal with it.

@@ -93,7 +94,7 @@ func (e *Event) BeatEvent(module, metricSet string, modifiers ...EventModifier)

// Set index prefix to overwrite default
if e.Index != "" {
b.Meta = common.MapStr{"index": e.Index}
b.Meta = common.MapStr{events.FieldMetaIndex: e.Index}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above, maybe there are other ways.

@kvch
Copy link
Contributor Author

kvch commented Jan 27, 2022

This PR is for 7.17. For stack monitoring, why is 8.0 not also affected?

The original issue is that sometimes customers accidentally removed the write alias and then it got recreated as an index breaking ILM for Beats (mostly Heartbeat). In 8.x, it is no longer an issue because the data stream manages the alias and its own backing indices for us. There is nothing to be done in Beats, just sending events to the name of the data stream.
However, it does not help us in 7.x. So we decided to address the problem by adding require_alias to the Bulk API calls when ILM is enabled.

@ruflin
Copy link
Member

ruflin commented Jan 27, 2022

To not block the release, could we revert previous change and instead have a fix into 7.17.1 for example?

@ruflin
Copy link
Member

ruflin commented Jan 27, 2022

For 8.0, I'm asking about the .monitoring indices no the ones create by us. But I think I got the answer between the lines. We never introduced this "check" in 8.0 so writing to pure indices still works as before.

@kvch
Copy link
Contributor Author

kvch commented Jan 27, 2022

Revert previous PR: #30058

@cmacknz
Copy link
Member

cmacknz commented Jan 27, 2022

The x-pack flag was removed in 8.0 and a lot of the behaviour around the monitoring indices changed, here is one of the top level PRs for more context: #24427

@kvch
Copy link
Contributor Author

kvch commented Jan 27, 2022

We agreed that this PR will be merged in the future when we are 100% sure that there are no unintended side effects.

@mergify
Copy link
Contributor

mergify bot commented Jan 27, 2022

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b fix-libbeat-bulk-for-monitoring upstream/fix-libbeat-bulk-for-monitoring
git merge upstream/7.17
git push upstream fix-libbeat-bulk-for-monitoring

@marius-dr
Copy link
Member

so this has to go into the 7.17 after today's release in order to have metricbeat monitoring work on the snapshots, right? I can confirm that the last BC works fine, while the last snapshot need this fix.

@kvch
Copy link
Contributor Author

kvch commented Feb 2, 2022

so this has to go into the 7.17 after today's release in order to have metricbeat monitoring work on the snapshots, right? I can confirm that the last BC works fine, while the last snapshot need this fix.

No, it is not required. The whole feature got reverted in 7.17 that this PR was supposed to fix. In 8.x, we also never used this feature because Beats send to data streams. What is the problem in the last snapshot?

@kvch
Copy link
Contributor Author

kvch commented Feb 2, 2022

I just synced with @marius-dr and the issue no longer persist in the latest snapshot of 7.17.

@cmacknz
Copy link
Member

cmacknz commented Feb 23, 2022

@kvch can this PR be closed?

@jlind23
Copy link
Collaborator

jlind23 commented Mar 2, 2022

up @kvch

@kvch
Copy link
Contributor Author

kvch commented Mar 7, 2022

The PR is outdated. We will need to rework the whole fix. I am closing this and reopen the original issue.

@kvch kvch closed this Mar 7, 2022
@simitt
Copy link
Contributor

simitt commented Mar 9, 2022

@kvch could you link to the reopened issue please so we can follow along. The changes are a precondition for elastic/apm-server#7116

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants