-
Notifications
You must be signed in to change notification settings - Fork 24.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document Field Level Security on Frozen Tier not working correctly #82044
Comments
Pinging @elastic/es-security (Team:Security) |
@madisonb Thank you for reporting problems with Elasticsearch! Unfortunately I'm having trouble replicating the issue. |
For the record, I have tried on a local 7.16.2 node, with the following settings:
the following commands:
@madisonb You would help us a lot if you could narrow down the reproduction a bit. @ywelsch do you have any idea or hint on the issue on this? |
@madisonb can you run the same failing command again with |
@albertzaharovits did you push data up into the index and let it roll over into the frozen tier? My issue is when actual data rolls into the partial tier Is there something specific you would like to see from the index mapping? The setup is using a component template and is roughly 800 lines long, and I would prefer not to post the entire thing as it exposes how our product is configured (for better or worse). I'm happy to provide a select/specific section if you'd like; we have dynamic templates, aliases, and traditional properties with all kinds of complex mappings that cover text, vectors, numbers, field data filters, geo, custom normalizers, etc. @ywelsch Running the following command:
doesn't seem to produce an error trace... Here's a snippet:
Do I have that right? Role definition that fails from my example above
Updating the role via Kibana to allow the
I can flip between these two role configurations to toggle the errors off and on for the user. The other interesting thing as I am trying to get a better error trace for you is that I seem to be able to directly query the indices I care about that I know are frozen when the erroneous role is turned on without issue, but querying the |
Do you get an error stack trace if you specify |
@madisonb Thanks for the details so far. Alas, I'm still unable to reproduce.
I have applied the policy manually, and my test policy triggers the I have created an alias (like 'pulse'), that points to both partially mounted indices and regular indices (indices that are in the frozen state, but not yet snapshotted and mounted, ie in the 'wait-for-shard-history-leases' and 'segment-count' steps), and querying the alias works fine for me.
I understand that the issue only manifests when querying the alias only? Can you confirm? If we can't have the mapping, can you show the out of Are you able to contact Elastic support and mention this issue (I assume you'd be more comfortable sharing the information that way). |
So perhaps I am having issues somewhere within the index pattern permissions or alias configuration?
ILM Policy
I can contact Elastic Support and reference this ticket as well. |
I take from this that the issue also manifests when the search ONLY includes searchable_snapshot shards. Moreover the same search request issued against selected searchable_snapshot shards encounters no such errors. In addition, the issue only happens if the user has FLS controls. I haven't seen anything suspicious in the ILM policy. This is an odd one.
, to verify if there's one single index/shard in a funny state, and which is then tripping the whole search request? |
So my specific user I've created does not have permission to _cat/indices, so I used another user to execute the _cat portion and then the regular user per above to do the _count
Returns
All indices were successful, no errors. |
@madisonb what if you select more than one index at once (but not via the alias)? For example, what does
return? |
Let's also see if there are other ways to get the stack trace for the exception. Can you try the following command to check if that returns a stack trace as part of the error response?
As last alternative (if the above does not yield a stack trace), you can try temporarily setting the log level of |
I tried a couple of different combinations of numbers (I have no real guess as to what corrupted index might be here) and stumbled upon something. It looks like index 000001-000005 all have the same exception thrown (below):
I was hoping this was a lead, but it looks like the ILM policy, and So I went back to trying to use the
I tried all 10 combinations of those indices to see if there was something off, but nothing hit in terms of a nice exception for you
Moving on to your second comment with the additional query parameters did yield something
attached is the exception that I hope you will find helpful I have not applied the |
excellent. This was exactly what I was looking for. Will do some digging and get back to you. |
It's a bug (indirectly) introduced by #78988 (can_match now applies the security wrappers). We will need to add lazy initialization for the |
The reason this was so hard to reproduce is that it came only into play with a lot of shards, where we needed results of some shards to trigger an optimization (can_match check) on later shards (introduced by #51708) that ultimately ran into the problem here. |
Interestingly enough, only the optimization in #51708 brought the issue to light (as it results in a hard failure), as regular can-match phases that fail just silently fall back to full query phases (= extra work). |
Fix in the works here: #82521 |
Field level security was interacting in bad ways with the can-match phase on frozen tier shards (interaction between FieldSubsetReader and RewriteCachingDirectoryReader). This made can-match phase fail, which in the normal case would result in extra load on the frozen tier, and in the extreme case (in interaction with #51708) made searches fail. This is a bug that was indirectly introduced by #78988. Closes #82044
Field level security was interacting in bad ways with the can-match phase on frozen tier shards (interaction between FieldSubsetReader and RewriteCachingDirectoryReader). This made can-match phase fail, which in the normal case would result in extra load on the frozen tier, and in the extreme case (in interaction with elastic#51708) made searches fail. This is a bug that was indirectly introduced by elastic#78988. Closes elastic#82044
@ywelsch thank you for your assistance, I wasn't crazy! Looks like the fix is slated for 7.17 and beyond, cheers. |
Field level security was interacting in bad ways with the can-match phase on frozen tier shards (interaction between FieldSubsetReader and RewriteCachingDirectoryReader). This made can-match phase fail, which in the normal case would result in extra load on the frozen tier, and in the extreme case (in interaction with elastic#51708) made searches fail. This is a bug that was indirectly introduced by elastic#78988. Closes elastic#82044
correct! Thanks again for reporting this bug and all the help on getting us this important stack trace. |
Field level security was interacting in bad ways with the can-match phase on frozen tier shards (interaction between FieldSubsetReader and RewriteCachingDirectoryReader). This made can-match phase fail, which in the normal case would result in extra load on the frozen tier, and in the extreme case (in interaction with #51708) made searches fail. This is a bug that was indirectly introduced by #78988. Closes #82044
Field level security was interacting in bad ways with the can-match phase on frozen tier shards (interaction between FieldSubsetReader and RewriteCachingDirectoryReader). This made can-match phase fail, which in the normal case would result in extra load on the frozen tier, and in the extreme case (in interaction with #51708) made searches fail. This is a bug that was indirectly introduced by #78988. Closes #82044 Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Elasticsearch version (
bin/elasticsearch --version
): 7.16.2 (running inside Elastic Cloud)Plugins installed: []
JVM version (
java -version
): n/a (Elastic Cloud)OS version (
uname -a
if on a Unix-like system): n/a (Elastic Cloud)Description of the problem including expected versus actual behavior:
I seem to be running into an issue where Field Level Security throws a null exception when operating on frozen indices.
I have a simple ILM policy for my index that moves data from Hot to Frozen after 12 hours. Within that data set, I would like to grant access to all fields except for a few specific ones that I would like to remain internal only.
If I create a new user and grant them a custom role with field level security (allowing and denying specific fields), that user cannot search for anything beyond my hot data tier without getting the following exception back
Within the data access role, If I disable
Grant access to specific fields
, the user can see and return results from the frozen tier.I will note that in my current environment, this role also is using a
Grant read privileges to specific documents
templated query, however that does not seem to have an impact on this issue. I have tried to produce a working example below that does not involve that privilege.Steps to reproduce:
Create a simple ILM policy that rolls data out of a hot index and into a frozen index
Index data into your ILM managed index so that you have both hot data AND frozen data within your cluster. If my ILM index alias was called
pulse
, my underlying indices arepulse-0001
,pulse-0002
, etc and the frozen indices look likepartial-pulse-0001
,partial-pulse-0002
... etcCreate a new role that grants read access to you your desired indices, like below (I am using Kibana):
Create a new user, and assign them typical access to a kibana space and grant them the data role from step 3
In a new private browser, log in as your new user and validate they have access to your frozen tier data and hot tier data, by viewing the Discover panel and looking at a timerange that spans hot and frozen tiers. (24 hrs in my case, see below as an example)
Grant access to specific fields
. Deny a field in your data (see below as an example)Note in the screenshot above that my data is cut off arbitrarily, right near my frozen tier rollover line from my ILM policy
But if I try to do an operation on the whole alias that includes frozen shards, I get shard exceptions.
and it works.
I have also tried combing through the built in roles for Elastic, as well as the built in index priviledges to see if there was anything related to the frozen tier specifically that causes this behavior, without much luck.
Provide logs (if relevant):
I have tried to comb the logs inside of Elastic Cloud but the UI does not seem to be surfacing this exception where I can find it.
The text was updated successfully, but these errors were encountered: