Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Path to Deprecating the Use of Thread Context for Security Information #2860

Open
davidlago opened this issue Jun 14, 2023 · 21 comments
Open
Labels
enhancement New feature or request question User requested information triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable.

Comments

@davidlago
Copy link

Background

The primary objective of this initiative is to substantially enhance the security of OpenSearch by transitioning from the current practice of storing security context information, including username, roles, and other metadata, within the thread context, to a more robust, dedicated, and immutable security context.

In the existing model, the thread context, which is accessible by all components in the Java thread including plugins, holds the security context. Since plugins can modify the thread context, it's possible for a plugin to escalate its privileges by altering the roles in the thread context. This not only a brittle approach but also places an undue burden on plugin developers, who must ensure they're enforcing proper authorization.

While there is work happening on what the extensibility of the product looks like (see OpenSearch Extensibility and Security for Extensions) it is worth exploring what improving the security posture of existing plugins could look like.

There are two problematic areas around the Thread Context approach as a mechanism for authorization in OpenSearch plugins: plugins can read and write to it.

Plugins can read the security context

Plugins that need to make authorization decisions that are not natively supported by the security plugin's "permission vocabulary", have only what they can obtain from the thread context (username, backend roles and roles) to enforce additional authorization. As an example, see how Anomaly Detection does backend role-based authorization.

This behavior is a smell that the security capabilities should be enriched to natively support these use cases and eliminate the need for plugins to reach into the thread context to "figure things out".

Removing this information from the thread context is not possible without modifying the logic inside of plugins needing it to make authorization decisions.

Plugins can write to the security context

This is used today as a way to impersonate the user in for example out-of-band interactions like here in Anomaly Detection.

@cwperks did an excellent job at explaining the context stashing process in #2487

Removing plugins ability to modify the security context is again not possible without modifying the logic in the plugin code.

Straw-man Proposal

I'm opening this RFC as a place to propose and discuss options. Any an all feedback is much appreciated ⭐ As a way to get the creative juices flowing, I'm writing my first stab at tackling this. Understanding that as far as I can see this is a breaking change for plugins as whether via implicit or explicit APIs/contracts we find ourselves in a situation where known plugins need to be updated to work without the use of the thread context.

We could then work on a two-pronged approach:

  1. We can work on a new authorization service (service in the Java parlance, not necessarily a microservice running somewhere else) that exposes an API that Core calls to authorize any action. This API can also fulfill other authorization use cases like a permission check before performing an action that plugins and others can leverage to create better user experiences.
  2. In parallel, this new service can, for backwards compatible reasons, implement the same interface as the thread context does today via a ThreadContextWrapper, accessible only to the security subsystem, so that plugins who are not yet calling the new API can still "read" and "write" to the thread context. I quote those two because it will no longer be the thread context, but rather the new API will translate those requests into whichever new constructs fulfill those. I am not sure this last part is possible for the "write" operation, but smarter people than me will take a stab at it and hopefully figure it out!

For the ThreadContextWrapper, calls from plugins would work as follows:

  • When a plugin requests the security context from the thread context, the call goes to the ThreadContextWrapper instead of the actual thread context. We could do this by introducing a dynamic proxy which intercepts calls to the thread context and routes them to the ThreadContextWrapper instead. In Java, this can be achieved using the Proxy class in the reflection package. When a plugin requests the thread context, it is actually given a proxy. This proxy will forward calls to the ThreadContextWrapper, which will then interact with the actual thread context.
  • The ThreadContextWrapper then translates this request into a call to the new SecurityContext. It gets the security information and returns it to the plugin, making it seem as if it came from the thread context.
  • If a plugin tries to modify the security context in the thread context, the ThreadContextWrapper intercepts the call... and yeah, not sure what happens here. Shrug ASCII art.
sequenceDiagram
    participant P as Plugin
    participant D as DynamicProxy
    participant TCW as ThreadContextWrapper
    participant SC as SecurityContext

    Note over P,SC: Initialization
    P->>D: Get current security context
    D->>TCW: Forward request to ThreadContextWrapper
    TCW->>SC: Request current security context
    SC-->>TCW: Returns current SecurityContext instance
    TCW-->>D: Returns current security context
    D-->>P: Returns current security context

    Note over P,SC: Attempt to modify security context
    P->>D: Request to modify security context
    D->>TCW: Forward request to ThreadContextWrapper
    Note over TCW: Unsure of what happens here ¯\_(ツ)_/¯
@davidlago davidlago added enhancement New feature or request question User requested information untriaged Require the attention of the repository maintainers and may need to be prioritized labels Jun 14, 2023
@dblock
Copy link
Member

dblock commented Jun 14, 2023

This is a worthy proposal for an incremental improvement that achieves centralizing the security context injecting operation, making it easier to get rid of altogether later. A few comments.

  1. Feels like ThreadContextWrapper is only useful to avoid modifying logic in plugins. It could also be useful to log where security context ingestion operations happen, but in any case you need to update plugin code. What are the risks of not doing so and just using the new SecurityContext APIs other than the fear that we might have broken "something"? My take is that if you have to modify the plugin code anyway, you might as well go to the new interface.
  2. Are there new APIs that can define this operation better? This whole process feels like "impersonation" to me. What are the new interfaces in Identity? If they are "login(credentials)", and "impersonate(token)", then SecurityContext is really Identity#login and context stuffing is Identity#impersonate.

@davidlago
Copy link
Author

Thanks for the feedback, @dblock.

  1. Feels like ThreadContextWrapper is only useful to avoid modifying logic in plugins. It could also be useful to log where security context ingestion operations happen, but in any case you need to update plugin code. What are the risks of not doing so and just using the new SecurityContext APIs other than the fear that we might have broken "something"? My take is that if you have to modify the plugin code anyway, you might as well go to the new interface.

As my shrug ASCII art explains better than I can, I still don't see a solution where plugin code does not have to be modified. This straw-man proposal is my naïve attempt at "pretend that it is possible and see if we can find a way". If in fact we just can't find away, then I agree... there is no point in overly complicating things with proxies etc if we have to modify plugin code anyway.

  1. Are there new APIs that can define this operation better? This whole process feels like "impersonation" to me. What are the new interfaces in Identity? If they are "login(credentials)", and "impersonate(token)", then SecurityContext is really Identity#login and context stuffing is Identity#impersonate.

This is what I was thinking too... did not want to call it Identity to overly anchor the proposal, but basically that'd be where we'd most likely implement these new interfaces.

@peternied
Copy link
Member

I might re-frame this problem as a lack of interfaces around the security context. The thread context holds a bag of security context data and there is no guidance on how/when it should be used. Plugins have had to solve problems by peaking into that bag, which provided features, but the methods aren't consistent and fragile. If we can offer replacements for inspecting/altering the security context, plugins will have to write less code to onboard securely, have clearly defined states, and be aware of breaking changes.

My take is that if you have to modify the plugin code anyway, you might as well go to the new interface.

👍

Plugins want to perform actions on the cluster in the security context of 1) the user that making the request and in the context of 2) the plugin, by tackling these two problems we could iterate towards the more advanced features like job scheduler or impersonation.

What do you think of looking at a specific plugin or two to figure out what interface would be a replacement, then draft a PR into those repositories?

@davidlago
Copy link
Author

What do you think of looking at a specific plugin or two to figure out what interface would be a replacement, then draft a PR into those repositories?

@peternied I think that is a good way to go about this exploration. In a similar vein, in conversations with @dblock he suggested that perhaps we try picking a plugin and have it use the SDK we are building for extensions (albeit still loaded as a plugin, and if it wanted to it'd still could access the thread context, but it will not have to).

@cwperks
Copy link
Member

cwperks commented Jun 14, 2023

IMO the primary problem that plugins try to solve by stashing the context is that they want to operate as themselves and not within a User context. Similar to the work for extensions for service accounts, I think we need a stronger representation of a plugin's identity and empower cluster admins to define what a plugin can and cannot do utilizing its identity.

As of now, users are assigned permissions to do actions on indices through a role. Because of this relationship to a role there is no concept of actual ownership. (i.e. plugin A owns index .x, .y and .z is not possible and system indices are a great example of a use case where plugins stash the context in order to meddle with a system index). By establishing ownership of an index to a plugin identity, the security plugin would be able to provide more secure interactions with a plugin's system index and define a more formal notion of a plugin identity.

@nknize
Copy link

nknize commented Jun 16, 2023

Why not use JPMS here for strong encapsulation and start bringing these security components into a new :libs:opensearch-security library (or we can do it in an OpenSearch module) in core to provide "security" by default?

:libs:opensearch-security module-info.java

exports org.opensearch.security.SecurityContext as an interface the plugins implement to define their access control to core.

Add a new SPI provider where plugins provide their security context through this service. This can include a definition of system indexes the plugin "owns".

Restrict ThreadContext to just the core by only exposing the class to internal core mechanisms as follows:

exports org.opensearch.core.common.concurrent to org.opensearch.server, <list of internal modules we want to expose access such as a security module>

We're already moving this direction through the core refactor effort, specifically for this level of strong encapsulation for enhanced security given JSM deprecation (see the issue description for this exact justification for moving to use JPMS to follow JEP 403). And we're working towards defining what we want to backport to 2.x to limit the blast radius.

We should weave in the changes here as part of that effort to enable default security extensions in the core library (which subsequently enables serverless and cloud native implementations to use the same mechanisms without taking a bloated dependency on :server).

@davidlago
Copy link
Author

Thanks @nknize. I was aware of opensearch-project/OpenSearch#1687 as a discussion on the JSM deprecation, but had't read opensearch-project/OpenSearch#5910. I'll do that now :)

@scrawfor99
Copy link
Collaborator

[Triage] This is a RFC so anyone interested should engage with this issue! Thank you for filing @davidlago.

@scrawfor99 scrawfor99 added triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable. and removed untriaged Require the attention of the repository maintainers and may need to be prioritized labels Jun 19, 2023
@nknize
Copy link

nknize commented Jun 20, 2023

Cross referencing the modularization phased meta issue: opensearch-project/OpenSearch#8110 The issue itself is a bit of a WIP as there are a lot of rogue packages across the core repo that are split and need to be resolved. I'll be updating this issue as I comb the codebase and referencing the corresponding refactor PRs that resolve the split package conflicts.

@reta
Copy link
Collaborator

reta commented Jun 20, 2023

We should weave in the changes here as part of that effort to enable default security extensions in the core library (which subsequently enables serverless and cloud native implementations to use the same mechanisms without taking a bloated dependency on :server).

I think this is a key (JPMS may help as well but won't save the day). The problem as I see it is that plugins have access to mostly all server APIs (if not all), including direct access to the thread local storage (#2487).

The help in the understanding what those core APIs and default security extensions are might come from the extensions: those are inherently out of process and won't be able to tamper thread local storage etc however they still may need to execute certain actions on the cluster. I might be wrong but unifying the core APIs between plugins and extensions (the underlying implementation will be different), we could also drop the thread local context as the carrier for security context.

@nknize
Copy link

nknize commented Jun 20, 2023

...we could also drop the thread local context as the carrier for security context.

Yes! That's part of what I'm suggesting here. Let plugins provide their security posture through an SPI provider (just one idea). ThreadContext would be strongly encapsulated to core only through JPMS (not deprecated).

@cwperks
Copy link
Member

cwperks commented Jun 20, 2023

I have been working towards a model of centralized scheduled job identity management to also deprecate the usage of roles injection that many plugins use to evaluate privileges outside of an active user context. On this branch I have created a notion of a durable (unstashable) section of the threadcontext that is immutable and always present even if a plugin stashes the threadcontext to operate in privileged mode.

For the use-case of scheduling jobs, Plugins will typically register a jobs index as a system index with the security plugin (See install_demo_configuration.sh). In order to write the job definition to that system index the plugin will first use common-utils to parse the serialized User out of the ThreadContext:

  1. Security Plugin > PrivilegeEvaluator > setUserInfoInThreadContext
  2. Common Utils > User > parse

After parsing the User from the ThreadContext they will then stash the context in order to write to the jobs index. When writing the job to the index, they will often include a copy of the user directly in the job index with the mapped roles (See bottom) of the user. See these mappings for the .opendistro-anomaly-detection-jobs index.

When Job Scheduler determines that a detector is due to be run it will invoke the runJob method on the job runner and plugins will typically start by injecting roles into the threadcontext - the roles that they saved in their jobs index associated with the job. See: https://github.com/opensearch-project/anomaly-detection/blob/main/src/main/java/org/opensearch/ad/AnomalyDetectorJobRunner.java#L301


What I would like to do with the branch that introduces the durable thread context is use this to make associating an identity to a scheduled job opaque to a plugin and remove the need for them to use common-utils to parse the user from the thread context. In order to do that I have a PR open in core to introduce Centralized Scheduled Job Identity Management which is a new extension point that an IdentityPlugin can define and can be utilized to put the identity system in the middle of job creation and job invocation. On job creation, the Job Scheduler can call on the identity system to associate the current user with the scheduled job and with a durable threadcontext the identity system can pick that information directly from the threadcontext and not worry if that information has been stashed or not.

I know this post has been more about deprecating roles injection rather than stashing the threadcontext, but it does illustrate one of the issues with stashing the thread context as it pertains to scheduled jobs.


  1. mapped roles are the result of roles mapping:
  • Direct user mapping
  • Backend role mapped to (internal) security role
  • Host mapping (i.e. all requests from 127.0.0.1 are mapped to all_access
  • And_Backend_roles - User must have all backend roles in the list to be mapped to the (internal) security role.

@nknize
Copy link

nknize commented Jun 20, 2023

I have created a notion of a durable (unstashable) section of the threadcontext that is immutable ...

Do you need any other headers from the ThreadContext or do you only need the security headers? Why not decouple the SecurityContext completely?

@peternied
Copy link
Member

@cwperks How would you want a core / plugin to communicate switching context at the end state?

Shiro allow this behavior via its Subject object through an associated thread or by 'running as' and releasing - I think adopting something like these conventions would be a good starting point. What do you think of these possible interfaces?

  • Runnable associateWith(Runnable), when there passed in runnable is wrapped and returned inside a runnable that will assume the context of the current subject.
IdentityService.getServiceAccount(SYSTEM).associateWith(() -> {
   /* Stuff as system*/
}).run();
  • void runAs​(PrincipalCollection principals) & PrincipalCollection releaseRunAs(), which are used to change the current subject to be as if it was that principal and then release the state - pretty much how the context stashing is used today.
try {
   IdentityService.getSubject().runAs(IdentityService.getServiceAccount(SYSTEM));
   /* Stuff as system*/
} finally {
   IdentityService.getSubject().releaseRunAs();
}

@cwperks
Copy link
Member

cwperks commented Jun 21, 2023

Do you need any other headers from the ThreadContext or do you only need the security headers? Why not decouple the SecurityContext completely?

The SecurityContext needs a secure enclave where it can be assured that it permeates the entire request handling and is tamper proof outside of the security system. IMO it could make sense to be a portion of the threadcontext or act very much like the threadcontext where it is accessible on the currently running thread or spawned threads during request handling. My instinct was to add a generic means of setting headers or transient headers that are unstashable in case the feature would be useful outside of the security plugin, but it may be better to make a more formalized concept of a SecurityContext which has a schema instead of the freeform usage of the threadcontext.

@peternied I like the simplicity of what you have written and what you have written looks like it can be a drop in replacement of threadcontext.stashContext. i.e. instead of threadContext.shashContext() { /* execute code within stashed context* }, it could be IdentityService.getSubject().runAs(IdentityService.getServiceAccount(SYSTEM)) { /* inside this block code is executed as system */ }

The problem I was trying to point out with the comment above is that it may be necessary in the context of security for scheduled jobs to have the current user accessible even if the threadcontext is stashed and unstashable threadcontext headers and transient headers help solve that problem.

@DarshitChanpura
Copy link
Member

DarshitChanpura commented Jun 22, 2023

IMO a SecurityContext that includes important user details like name, roles, and backend roles, as well as the plugin identity and any other request relevant info, sounds like good solution. When necessary, these details can be accessed via an API. This approach allows plugins to make API calls to determine whether the current user, within the defined "context," has the necessary privileges to perform a specific task. For example, instead of invoking stashContext(), the AD plugin can now use SecurityContext.authorize(<relevant-args>) and proceed based on the resulting authorization status.

The SecurityContext is located in the core and exposes APIs such as authorize(<relevant-args>). Under the hood, these APIs use the PrivilegesEvaluator to perform evaluations such as determining whether the current user is authorized to perform a requested action and confirming that the plugin has permission to access the system index it is requesting.

The AD plugin, which currently relies on common-utils to retrieve the _opendistro_security_user_info object from the thread context, can now evaluate permissions, including backend-role permissions, using the SecurityContext. As a result, there will be no need to retrieve user data from the thread context.

Instead of injecting user roles, that are saved on the jobs index for each job, into the thread context when starting job execution, each job runner can also fetch roles information real-time from SecurityContext to avoid stale roles issue (not sure how to handle this with external IdPs yet). Maybe we store a replica of external IdP user that gets update real-time, idk yet, but there are alternatives that can be considered.

I'm in favor of the idea of associating an identity with each plugin. This would add an ability to perform authorization on plugins trying to access a system index.

@nknize
Copy link

nknize commented Jun 22, 2023

The SecurityContext is located in the core and exposes APIs such as authorize(<relevant-args>).

This is what I was suggesting. SecurityContext in the core that's exported by JPMS w/ the proper downstream facing API needed by the plugins or extensions.

The AD plugin, which currently relies on common-utils to retrieve the _opendistro_security_user_info object from the thread context...

We should dissolve components of common-utils into proper core libraries (e.g., :libs:opensearch-security defines the SecurityContext class and API)

...each job runner can also fetch roles information real-time from SecurityContext...

Yes, decouple this from ThreadContext. This is where I was thinking SPI could possibly come in useful? Let downstream plugins provide their SecurityContext posture from a SecurityContextProvider given state of the plugin that associates the identity with the plugin.

@davidlago
Copy link
Author

We should dissolve components of common-utils into proper core libraries (e.g., :libs:opensearch-security defines the SecurityContext class and API)

Agreed. This idea has been surfaced a few times already, last time from @cwperks if memory serves.

@davidlago
Copy link
Author

An additional benefit of pushing the security context behind an interface is that that decoupling from the specific implementation opens the door for pluggable implementations of that API. Supporting different (and perhaps even chainable) authorization strategies behind the .authorize() method would be interesting to explore.

@scrawfor99
Copy link
Collaborator

I think associating identities within core would be a strong option for handling this issue and then removing the entire use. As Peter mentioned, we can make use of the Shiro framework and are able to heavily schematize the use. I do not think we should allow for more than a strictly enforced set of operations. I am happy to implement or help with the conversion if we decide to go this route for removal.

@krishna-ggk
Copy link
Contributor

Really great discussion! +1 to pulling out SecurityContext out into its own interface which greatly improves the posture.

Instead of injecting user roles, that are saved on the jobs index for each job, into the thread context when starting job execution, each job runner can also fetch roles information real-time from SecurityContext to avoid stale roles issue (not sure how to handle this with external IdPs yet). Maybe we store a replica of external IdP user that gets update real-time, idk yet, but there are alternatives that can be considered.

For background job, tokens are typically better fit as outlined in #7573 avoiding IdP dependencies. We should probably discuss that more in that PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question User requested information triaged Issues labeled as 'Triaged' have been reviewed and are deemed actionable.
Projects
None yet
Development

No branches or pull requests

9 participants