Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix repeated error save loops in File Settings Service #90271

Merged
merged 7 commits into from
Sep 23, 2022

Conversation

grcevski
Copy link
Contributor

There were two bugs that were exposed by a recent Windows failures on File Settings:

  • When we process an initial settings file which had errors, the reserved metadata version that was saved was 0, which is wrong because we use 0 to mean we need to touch and reset the file from a snapshot restore. The correct default value should be -1, meaning no successful file settings were ever written. Because we used 0, the code that monitors for 0 version was constantly resetting the modified timestamp on the settings file, which caused a flurry of save activity, eventually leading to very rare failure when we time out waiting on the error state to be saved.
  • Windows apparently needs write file permission to be able to touch the settings.json file in the operator directory. This means that this functionality never worked correctly in production on Windows.

Closes #90222

@grcevski grcevski added >bug :Core/Infra/Core Core issues without another label Team:Core/Infra Meta label for core/infra team v8.4.3 v8.5.1 v8.6.0 labels Sep 22, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@elasticsearchmachine
Copy link
Collaborator

Hi @grcevski, I've created a changelog YAML for you.

@grcevski
Copy link
Contributor Author

I'll follow-up shortly with additional tests that ensure that the metadata version is -1 when only error was ever saved.

Nikola Grcevski added 5 commits September 22, 2022 15:13
- Prevent repeated error save calls
- Allow permissions for touch to operator settings
@grcevski grcevski added the test-windows Trigger CI checks on Windows label Sep 22, 2022
Copy link
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@grcevski
Copy link
Contributor Author

@elasticsearchmachine run elasticsearch-ci/part-1

@grcevski
Copy link
Contributor Author

@elasticsearchmachine run elasticsearch-ci/part-3-windows

@grcevski grcevski merged commit 6af2b5c into elastic:main Sep 23, 2022
@grcevski grcevski deleted the fix/repeated_error_calls branch September 23, 2022 17:59
This was referenced Sep 23, 2022
lockewritesdocs pushed a commit that referenced this pull request Sep 28, 2022
lockewritesdocs pushed a commit that referenced this pull request Sep 28, 2022
lockewritesdocs pushed a commit to lockewritesdocs/elasticsearch that referenced this pull request Sep 28, 2022
lockewritesdocs pushed a commit to lockewritesdocs/elasticsearch that referenced this pull request Sep 28, 2022
@csoulios csoulios added v8.5.0 and removed v8.5.1 labels Nov 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Core/Infra/Core Core issues without another label Team:Core/Infra Meta label for core/infra team test-windows Trigger CI checks on Windows v8.4.3 v8.5.0 v8.6.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] FileSettingsServiceIT testErrorSaved failing
4 participants