Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DLS search performance/canMatch impact #46817

Closed
henningandersen opened this issue Sep 18, 2019 · 13 comments · Fixed by #47816
Closed

DLS search performance/canMatch impact #46817

henningandersen opened this issue Sep 18, 2019 · 13 comments · Fixed by #47816
Labels
>enhancement :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC v7.3.0

Comments

@henningandersen
Copy link
Contributor

henningandersen commented Sep 18, 2019

During IndexShard.acquireSearcher, the readerWrapper is applied, which populates the DLS bitsets. This causes a performance issue, since IndexShard.acquireSearcher is called during the canMatch phase. In this context it seems unnecessary, since we are only after whether the shard could match or not, which does not require DLS to kick in. When searching for a short time range in a large number of time based indices, this causes a performance impact which seems unnecessary.

Concrete observations: A query that takes 30ms can take 26s for a DLS filtered user. The user has many DLS roles applied, which seems to increase the impact of the issue. The hot threads are dominated by variants of this stack trace:

app//org.apache.lucene.search.DisjunctionDISIApproximation.nextDoc(DisjunctionDISIApproximation.java:55)
app//org.apache.lucene.search.DisjunctionDISIApproximation.nextDoc(DisjunctionDISIApproximation.java:55)
app//org.apache.lucene.util.BitSet.or(BitSet.java:95)
app//org.apache.lucene.util.FixedBitSet.or(FixedBitSet.java:271)
app//org.apache.lucene.util.BitSet.of(BitSet.java:41)
org.elasticsearch.xpack.core.security.authz.accesscontrol.DocumentSubsetBitsetCache.lambda$getBitSet$2(DocumentSubsetBitsetCache.java:153)
org.elasticsearch.xpack.core.security.authz.accesscontrol.DocumentSubsetBitsetCache$$Lambda$5134/0x000000080204e840.load(Unknown Source)
app//org.elasticsearch.common.cache.Cache.computeIfAbsent(Cache.java:433)
org.elasticsearch.xpack.core.security.authz.accesscontrol.DocumentSubsetBitsetCache.getBitSet(DocumentSubsetBitsetCache.java:135)
org.elasticsearch.xpack.core.security.authz.accesscontrol.DocumentSubsetReader.<init>(DocumentSubsetReader.java:160)
org.elasticsearch.xpack.core.security.authz.accesscontrol.DocumentSubsetReader.<init>(DocumentSubsetReader.java:34)
org.elasticsearch.xpack.core.security.authz.accesscontrol.DocumentSubsetReader$DocumentSubsetDirectoryReader$1.wrap(DocumentSubsetReader.java:120)
app//org.apache.lucene.index.FilterDirectoryReader$SubReaderWrapper.wrap(FilterDirectoryReader.java:62)
app//org.apache.lucene.index.FilterDirectoryReader.<init>(FilterDirectoryReader.java:91)
org.elasticsearch.xpack.core.security.authz.accesscontrol.DocumentSubsetReader$DocumentSubsetDirectoryReader.<init>(DocumentSubsetReader.java:116)
org.elasticsearch.xpack.core.security.authz.accesscontrol.DocumentSubsetReader.wrap(DocumentSubsetReader.java:38)
org.elasticsearch.xpack.core.security.authz.accesscontrol.SecurityIndexReaderWrapper.apply(SecurityIndexReaderWrapper.java:86)
org.elasticsearch.xpack.core.security.authz.accesscontrol.SecurityIndexReaderWrapper.apply(SecurityIndexReaderWrapper.java:42)
app//org.elasticsearch.index.shard.IndexShard.wrapSearcher(IndexShard.java:1262)
app//org.elasticsearch.index.shard.IndexShard.acquireSearcher(IndexShard.java:1240)
app//org.elasticsearch.index.shard.IndexShard.acquireSearcher(IndexShard.java:1224)
app//org.elasticsearch.search.SearchService.createSearchContext(SearchService.java:632)
app//org.elasticsearch.search.SearchService.canMatch(SearchService.java:1007)
app//org.elasticsearch.search.SearchService.canMatch(SearchService.java:1020)
app//org.elasticsearch.action.search.SearchTransportService.lambda$registerRequestHandler$14(SearchTransportService.java:396)
app//org.elasticsearch.action.search.SearchTransportService$$Lambda$3005/0x0000000801b4d840.messageReceived(Unknown Source)

These all run directly in the transport thread.

Encountered on ES v7.3.0, running on linux OS.

@henningandersen henningandersen added :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC v7.3.0 labels Sep 18, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-security

@gwbrown
Copy link
Contributor

gwbrown commented Sep 18, 2019

I have seen this as well, with similar differences between filtered and unfiltered queries. The hot threads stack traces look almost exactly the same - in the case I saw, the query filters on the role were fairly straightforward - most were a boolean query of two match queries, although there were a few with wildcard queries. I'll update here if there's anything more I can add.

@henningandersen
Copy link
Contributor Author

We discussed this on another channel and while it seems there could be room for improvement here, it is not a bug. I have converted this into an enhancement request instead.

@cpmoore
Copy link

cpmoore commented Oct 8, 2019

I have seen similar slowness in our environment with users for whom DSL roles are applied. I would be very happy to get this enhancement in a future release.

@jimczi
Copy link
Contributor

jimczi commented Oct 9, 2019

I wonder if we can load the bitset eagerly like we do for nested fields for instance. This would slow down the refresh of readers but since the cache is per segment it shouldn't make a difference unless a big merge happened. This wouldn't eliminate the possibility for the bitset to be regenerated since the security cache is bounded but this could help for the majority of cases (where the number of role query is under control).
It is really a bug that such a costly operation can happen during the can_match phase but I don't see how to fix it nicely. We need to run the costly operations to get simple things like number of docs and number of deleted docs and these informations are used during the can_match phase so we cannot skip this operation completely.
We could skip the can_match entirely if indices have dls enabled but that doesn't feel right too.
@henningandersen can you explain why we don't consider this issue a bug ?

@cpmoore
Copy link

cpmoore commented Oct 9, 2019

To give a better idea of the performance impact of DLS, i ran some simple queries as a test.
service.group is a keyword field in our mapping. I redacted the actual values in the query, but you get the gist.

First as my user without any access restrictions:

POST /filebeat-*/_search
{
    "query": {
       "match_all":{}
    }
}
{
    "took": 39,
     ...
}
POST /filebeat-*/_search
{
    "query": {
        "terms": {
            "service.group": [
                "group1",
                "group2",
                "group3",
                "group4"
            ]
        }
    }
}
{
    "took": 470,
     ...
}

Then as a test user, with the terms query from above as a DLS query

POST /filebeat-*/_search
{
  "query":{
  	"match_all":{}
  }	
}
{
    "took": 5383,
     ...
}
POST /filebeat-*/_search
{
    "query": {
        "terms": {
            "service.group": [
                "group1",
                "group2",
                "group3",
                "group4"
            ]
        }
    }
}
{
    "took": 5677,
     ...
}

Am i wrong in believing that the terms query from my request should take roughly the same amount of time as a match_all query from a user with terms query as a DLS?
Instead it takes 5 seconds longer.

@henningandersen
Copy link
Contributor Author

@jimczi I guess this is borderline. ES works fine if configured with a high enough xpack.security.dls.bitset.cache.size, though there is room for lowering some of the search tail latencies by not having to initialize this at search time as new segments are created or data is aged out of the cache.
On the other hand, this is doing extraordinary long operations (not sure if it includes IO?) on transport which could be regarded a bug, but is not unprecedented.
I am open to either interpretation.

@henningandersen
Copy link
Contributor Author

@cpmoore which release are you on? Above 7.3 you might want to set xpack.security.dls.bitset.cache.size to a higher value than default 50MB when using DLS.

@jimczi
Copy link
Contributor

jimczi commented Oct 9, 2019

On the other hand, this is doing extraordinary long operations (not sure if it includes IO?) on transport which could be regarded a bug, but is not unprecedented.

That's the part that I care about. I think we should fix this and I am working on a pr to propose a simple solution. So to be clear my worry is not that requests running with DLS are slower than normal requests. This is expected and sizing the cache is important even though 50MB (the default) gives some room to cache a fairly large number of documents/bitsets.
My big worry is that we always run the loading of this cache in the can_match phase and that's it's unexpected.

Am i wrong in believing that the terms query from my request should take roughly the same amount of time as a match_all query from a user with terms query as a DLS?
Instead it takes 5 seconds longer.

It is expected since the first execution of a DLS query will eagerly build the cache version of the role query. As explained above the result is cached per segment so the following execution should be comparable to the non-DLS case.

jimczi added a commit to jimczi/elasticsearch that referenced this issue Oct 9, 2019
This change modifies the local execution of the `can_match` phase to not apply
the plugin's reader wrapper if it is configured when acquiring the searcher.
We must ensure that the phase runs quickly and since we don't know the cost
of applying the wrapper it is preferrable to avoid it entirely. The can_match
phase can aford false positives so it is also safe for the builtin plugins
that use this functionality.

Closes elastic#46817
jimczi added a commit to jimczi/elasticsearch that referenced this issue Oct 9, 2019
This change modifies the local execution of the `can_match` phase to not apply
the plugin's reader wrapper if it is configured when acquiring the searcher.
We must ensure that the phase runs quickly and since we don't know the cost
of applying the wrapper it is preferrable to avoid it entirely. The can_match
phase can aford false positives so it is also safe for the builtin modules
that use this functionality.

Closes elastic#46817
jimczi added a commit to jimczi/elasticsearch that referenced this issue Oct 9, 2019
This change modifies the local execution of the `can_match` phase to **not** apply
the plugin's reader wrapper (if it is configured) when acquiring the searcher.
We must ensure that the phase runs quickly and since we don't know the cost
of applying the wrapper it is preferable to avoid it entirely. The can_match
phase can aford false positives so it is also safe for the builtin plugins
that use this functionality.

Closes elastic#46817
@jimczi
Copy link
Contributor

jimczi commented Oct 9, 2019

I opened #47816 to propose a solution but I am open to alternatives. We should also consider not running the can_match phase when dls is activated as a last resort (if we cannot find an agreement).

@cpmoore
Copy link

cpmoore commented Oct 9, 2019

@henningandersen

I'm on version 7.4.0. Just upgraded this week.

I believe if it was a problem with the cache size being too small then subsequent requests from the same user for the same query would be at least a little faster as the query DLS should be in the cache having just been used. Right? However each search consistently takes 5+ seconds.

@henningandersen
Copy link
Contributor Author

@cpmoore That is a bit hard to conclude on based on the information available. If the hot_threads while running the 5 sec query look similar to the stacktrace above, I think increasing the cache size should help. I suggest to try it out, for instance using 256MB if you have the heap space available for it.

@cpmoore
Copy link

cpmoore commented Oct 9, 2019

@henningandersen
I tried this on our test cluster and it did seem to help. Granted our test cluster is a lot smaller than production.
Am i right in assuming that the more documents the query matches the faster the cache will fill up?

jimczi added a commit that referenced this issue Oct 14, 2019
This change modifies the local execution of the `can_match` phase to **not** apply
the plugin's reader wrapper (if it is configured) when acquiring the searcher.
We must ensure that the phase runs quickly and since we don't know the cost
of applying the wrapper it is preferable to avoid it entirely. The can_match
phase can aford false positives so it is also safe for the builtin plugins
that use this functionality.

Closes #46817
jimczi added a commit that referenced this issue Oct 14, 2019
This change modifies the local execution of the `can_match` phase to **not** apply
the plugin's reader wrapper (if it is configured) when acquiring the searcher.
We must ensure that the phase runs quickly and since we don't know the cost
of applying the wrapper it is preferable to avoid it entirely. The can_match
phase can aford false positives so it is also safe for the builtin plugins
that use this functionality.

Closes #46817
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC v7.3.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants