-
Notifications
You must be signed in to change notification settings - Fork 24.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DLS search performance/canMatch impact #46817
Comments
Pinging @elastic/es-security |
I have seen this as well, with similar differences between filtered and unfiltered queries. The hot threads stack traces look almost exactly the same - in the case I saw, the query filters on the role were fairly straightforward - most were a boolean query of two |
We discussed this on another channel and while it seems there could be room for improvement here, it is not a bug. I have converted this into an enhancement request instead. |
I have seen similar slowness in our environment with users for whom DSL roles are applied. I would be very happy to get this enhancement in a future release. |
I wonder if we can load the bitset eagerly like we do for nested fields for instance. This would slow down the refresh of readers but since the cache is per segment it shouldn't make a difference unless a big merge happened. This wouldn't eliminate the possibility for the bitset to be regenerated since the security cache is bounded but this could help for the majority of cases (where the number of role query is under control). |
To give a better idea of the performance impact of DLS, i ran some simple queries as a test. First as my user without any access restrictions:
Then as a test user, with the terms query from above as a DLS query
Am i wrong in believing that the terms query from my request should take roughly the same amount of time as a |
@jimczi I guess this is borderline. ES works fine if configured with a high enough |
@cpmoore which release are you on? Above 7.3 you might want to set |
That's the part that I care about. I think we should fix this and I am working on a pr to propose a simple solution. So to be clear my worry is not that requests running with DLS are slower than normal requests. This is expected and sizing the cache is important even though
It is expected since the first execution of a DLS query will eagerly build the cache version of the role query. As explained above the result is cached per segment so the following execution should be comparable to the non-DLS case. |
This change modifies the local execution of the `can_match` phase to not apply the plugin's reader wrapper if it is configured when acquiring the searcher. We must ensure that the phase runs quickly and since we don't know the cost of applying the wrapper it is preferrable to avoid it entirely. The can_match phase can aford false positives so it is also safe for the builtin plugins that use this functionality. Closes elastic#46817
This change modifies the local execution of the `can_match` phase to not apply the plugin's reader wrapper if it is configured when acquiring the searcher. We must ensure that the phase runs quickly and since we don't know the cost of applying the wrapper it is preferrable to avoid it entirely. The can_match phase can aford false positives so it is also safe for the builtin modules that use this functionality. Closes elastic#46817
This change modifies the local execution of the `can_match` phase to **not** apply the plugin's reader wrapper (if it is configured) when acquiring the searcher. We must ensure that the phase runs quickly and since we don't know the cost of applying the wrapper it is preferable to avoid it entirely. The can_match phase can aford false positives so it is also safe for the builtin plugins that use this functionality. Closes elastic#46817
I opened #47816 to propose a solution but I am open to alternatives. We should also consider not running the can_match phase when dls is activated as a last resort (if we cannot find an agreement). |
I'm on version 7.4.0. Just upgraded this week. I believe if it was a problem with the cache size being too small then subsequent requests from the same user for the same query would be at least a little faster as the query DLS should be in the cache having just been used. Right? However each search consistently takes 5+ seconds. |
@cpmoore That is a bit hard to conclude on based on the information available. If the |
@henningandersen |
This change modifies the local execution of the `can_match` phase to **not** apply the plugin's reader wrapper (if it is configured) when acquiring the searcher. We must ensure that the phase runs quickly and since we don't know the cost of applying the wrapper it is preferable to avoid it entirely. The can_match phase can aford false positives so it is also safe for the builtin plugins that use this functionality. Closes #46817
This change modifies the local execution of the `can_match` phase to **not** apply the plugin's reader wrapper (if it is configured) when acquiring the searcher. We must ensure that the phase runs quickly and since we don't know the cost of applying the wrapper it is preferable to avoid it entirely. The can_match phase can aford false positives so it is also safe for the builtin plugins that use this functionality. Closes #46817
During
IndexShard.acquireSearcher
, thereaderWrapper
is applied, which populates the DLS bitsets. This causes a performance issue, sinceIndexShard.acquireSearcher
is called during thecanMatch
phase. In this context it seems unnecessary, since we are only after whether the shard could match or not, which does not require DLS to kick in. When searching for a short time range in a large number of time based indices, this causes a performance impact which seems unnecessary.Concrete observations: A query that takes 30ms can take 26s for a DLS filtered user. The user has many DLS roles applied, which seems to increase the impact of the issue. The hot threads are dominated by variants of this stack trace:
These all run directly in the transport thread.
Encountered on ES v7.3.0, running on linux OS.
The text was updated successfully, but these errors were encountered: