Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interaction between flood stage and system indices #64251

Open
Tracked by #50251
jaymode opened this issue Oct 27, 2020 · 3 comments
Open
Tracked by #50251

Interaction between flood stage and system indices #64251

jaymode opened this issue Oct 27, 2020 · 3 comments
Labels
:Core/Infra/Core Core issues without another label >enhancement :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc) Team:Core/Infra Meta label for core/infra team Team:Security Meta label for security team

Comments

@jaymode
Copy link
Member

jaymode commented Oct 27, 2020

When a node hits the flood stage watermark, all indices on that node get the index.blocks.read_only_allow_delete setting applied with a value of true. This currently applies to system indices as well as data indices. When this happens, system operations that require writes will begin to fail, which is acceptable for certain non-critical actions but for critical actions we need to consider whether failure is the right thing to do. In an effort to reduce the scope of actions that could bypass the flood stage read only block, I have attempted to enumerate what I believe we should consider as critical operations that would otherwise fail.

Critical Actions

Authentication

An item that would fail once the flood stage is hit is the ability to authenticate when using SAML, OpenID Connect, or delegated PKI authentication and to a certain extent Kerberos authentication. SAML, OpenID Connect, and delegated PKI authentication results in the generation of an access and refresh token that are used for subsequent access to Elasticsearch; if the document cannot be written to the security index then the authentication will fail. For Kerberos, Elasticsearch itself does not require the use of tokens for subsequent authentication but it will have a significant performance impact if tokens are not used. Kerberos authentication using Kibana requires the token service to be enabled so it will appear as users cannot authenticate using kerberos if users are accessing Elasticsearch through Kibana.

A workaround could be to use the built-in users or a file realm user. Built-in users can be disabled via the API and if this is the case then unless we allow enabling/disabling of a user to bypass the watermark then we cannot rely on built-in users. Additionally, there is a setting that completely disables our reserved realm, which contains the built-in users and that is another reason why we should not rely on them being available. A file based realm is our recommendation for recovery but we do not require one to be enabled and should not make recovering from being over the flood stage more difficult than it needs to be.

Credential Invalidation / Logout

In the event of the security system index becoming read only, invalidation of API keys and tokens fail. We should do our best to keep these operations available as they may be needed to stop an influx of data that is pushing the cluster to the flood stage uncontrollably.

SAML and OpenID Connect logout also need the ability to write data to an index as the tokens used are invalidated as part of the logout operation.

Disabling user

Along the same lines as above, it may become necessary to disable a user temporarily while attempting to get a cluster back up and running as a means to stop data from coming in until the cluster can be rebalanced and have any additional resources that may be needed.

Identity Provider operations

There are probably some actions within this plugin that we may want to allow bypassing a watermark, but I am not familiar enough with the details of this to truly provide a recommendation. @tvernum @jkakavas any thoughts?

Proposal

I'd like to propose that we allow a system index plugin to opt-in actions that would be allowed to bypass the flood stage so that they can allow data to be written to an index. I've only identified security components as those that would bypass the flood stage (as of writing) and currently believe that the Security plugin would opt-in the following transport actions:

  • TransportSamlLogoutAction
  • TransportSamlAuthenticateAction
  • TransportSamlInvalidateSessionAction
  • TransportInvalidateApiKeyAction
  • TransportSetEnabledAction
  • TransportDelegatePkiAuthenticationAction
  • TransportOpenIdConnectAuthenticateAction
  • TransportOpenIdConnectLogoutAction

An item worth consideration is a limit on the amount that we should allow critical operations to push past the flood stage; I don't think we should allow for the critical operations to push the disk out of space but if the configuration uses byte values, how far past the flood stage do we allow the critical operations to go?

@jaymode jaymode added >enhancement :Core/Infra/Core Core issues without another label :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc) labels Oct 27, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (:Core/Infra/Core)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-security (:Security/Authentication)

@elasticmachine elasticmachine added Team:Core/Infra Meta label for core/infra team Team:Security Meta label for security team labels Oct 27, 2020
@jaymode jaymode mentioned this issue Oct 27, 2020
23 tasks
@tvernum
Copy link
Contributor

tvernum commented Oct 28, 2020

Identity Provider operations

There's nothing that writes to the security indices here.

@rjernst rjernst added the needs:triage Requires assignment of a team area label label Dec 3, 2020
@rjernst rjernst removed the needs:triage Requires assignment of a team area label label Dec 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Core Core issues without another label >enhancement :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc) Team:Core/Infra Meta label for core/infra team Team:Security Meta label for security team
Projects
None yet
Development

No branches or pull requests

4 participants