-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partial Evaluation produce duplicate rules #4516
Comments
Thanks for the detailed report. 👀 I'm afraid I'm not sure I fully understand what you're referring to as duplicates here. The first block seems impossible, given that it contains both
and
So this query can't ever be true. 🤔 Maybe you refer to those as duplicates?
Or am I completely on the wrong track here? Generally when it comes to PE results, "minimality" isn't really the first concern. Do you worry about excluding impossible queries, or about making the queries simpler..? 🤔 |
@srenatus Thanks a lot for your reply 👍 I am actually having some performance issues here. And my concerns probably cover both. i.e.: the compile result should
|
I've just now realised that there's everything I need to reproduce this in the first message 😅 I'll play some with this locally. Usually, I prefer |
Let's see where the queries comes from... 🔍 Taking this (from object/record/allow.rego) for an example, # delegated to draft dataset rules
allow {
trace(input.object.allow)
isDraftDataset
[resourceType, operationType, resourceUriPrefix] := breakdownOperationUri(input.operationUri)
inputOperationUri := concat("/", ["object", "dataset", "draft", operationType])
# proxy to dataset related permission decision
verifyDatasetPermission(inputOperationUri, "record")
} I'm adding Now, running full outputallow {
trace(input.object.allow)
input.object.record["dataset-draft"]
not input.object.record.publishing.state
input.object.record.publishing.state = "draft"
not input.object.record["access-control"].orgUnitOwnerId
}
allow {
trace(input.object.allow)
input.object.record["dataset-draft"]
not input.object.record.publishing.state
input.object.record.publishing.state = "draft"
not input.object.record["access-control"].ownerId
}
allow {
trace(input.object.allow)
input.object.record["dataset-draft"]
not input.object.record.publishing.state
input.object.record.publishing.state = "draft"
"0e0ba07e-bda4-4adb-bca2-c44c3984ddb7" = input.object.record["access-control"].orgUnitOwnerId
}
allow {
trace(input.object.allow)
input.object.record["dataset-draft"]
not input.object.record.publishing.state
input.object.record.publishing.state = "draft"
"1b3736aa-fc09-439e-bb8e-e1b03551add9" = input.object.record["access-control"].orgUnitOwnerId
}
allow {
trace(input.object.allow)
input.object.record["dataset-draft"]
not input.object.record.publishing.state
input.object.record.publishing.state = "draft"
"277d01aa-a6b9-4ee8-8519-717e2f4da6d4" = input.object.record["access-control"].orgUnitOwnerId
}
allow {
trace(input.object.allow)
input.object.record["dataset-draft"]
not input.object.record.publishing.state
input.object.record.publishing.state = "draft"
"8fd9486c-5e07-438a-8025-4dc9b99221dd" = input.object.record["access-control"].orgUnitOwnerId
}
allow {
trace(input.object.allow)
input.object.record["dataset-draft"]
not input.object.record.publishing.state
input.object.record.publishing.state = "draft"
"c59aa487-32fc-4d00-b16c-e070f2c88943" = input.object.record["access-control"].orgUnitOwnerId
}
allow {
trace(input.object.allow)
input.object.record["dataset-draft"]
not input.object.record.publishing.state
input.object.record.publishing.state = "draft"
"d4f8eb99-af81-4629-8056-bb261d51f2ba" = input.object.record["access-control"].orgUnitOwnerId
}
allow {
trace(input.object.allow)
input.object.record["dataset-draft"]
not input.object.record.publishing.state
input.object.record.publishing.state = "draft"
"e440f319-304e-4ffe-9b93-2b40adfe5aba" = input.object.record["access-control"].ownerId
}
allow {
trace(input.object.allow)
input.object.record["dataset-draft"]
not input.object.record.publishing.state
input.object.record.publishing.state = "draft"
"ff603803-5b64-448f-bf90-a2c9bff5134e" = input.object.record["access-control"].orgUnitOwnerId
}
allow {
trace(input.object.allow)
input.object.record["dataset-draft"]
input.object.record.publishing.state = "draft"
input.object.record.publishing.state = "draft"
not input.object.record["access-control"].orgUnitOwnerId
}
allow {
trace(input.object.allow)
input.object.record["dataset-draft"]
input.object.record.publishing.state = "draft"
input.object.record.publishing.state = "draft"
not input.object.record["access-control"].ownerId
}
allow {
trace(input.object.allow)
input.object.record["dataset-draft"]
input.object.record.publishing.state = "draft"
input.object.record.publishing.state = "draft"
"0e0ba07e-bda4-4adb-bca2-c44c3984ddb7" = input.object.record["access-control"].orgUnitOwnerId
}
allow {
trace(input.object.allow)
input.object.record["dataset-draft"]
input.object.record.publishing.state = "draft"
input.object.record.publishing.state = "draft"
"1b3736aa-fc09-439e-bb8e-e1b03551add9" = input.object.record["access-control"].orgUnitOwnerId
}
allow {
trace(input.object.allow)
input.object.record["dataset-draft"]
input.object.record.publishing.state = "draft"
input.object.record.publishing.state = "draft"
"277d01aa-a6b9-4ee8-8519-717e2f4da6d4" = input.object.record["access-control"].orgUnitOwnerId
}
allow {
trace(input.object.allow)
input.object.record["dataset-draft"]
input.object.record.publishing.state = "draft"
input.object.record.publishing.state = "draft"
"8fd9486c-5e07-438a-8025-4dc9b99221dd" = input.object.record["access-control"].orgUnitOwnerId
}
allow {
trace(input.object.allow)
input.object.record["dataset-draft"]
input.object.record.publishing.state = "draft"
input.object.record.publishing.state = "draft"
"c59aa487-32fc-4d00-b16c-e070f2c88943" = input.object.record["access-control"].orgUnitOwnerId
}
allow {
trace(input.object.allow)
input.object.record["dataset-draft"]
input.object.record.publishing.state = "draft"
input.object.record.publishing.state = "draft"
"d4f8eb99-af81-4629-8056-bb261d51f2ba" = input.object.record["access-control"].orgUnitOwnerId
}
allow {
trace(input.object.allow)
input.object.record["dataset-draft"]
input.object.record.publishing.state = "draft"
input.object.record.publishing.state = "draft"
"e440f319-304e-4ffe-9b93-2b40adfe5aba" = input.object.record["access-control"].ownerId
}
allow {
trace(input.object.allow)
input.object.record["dataset-draft"]
input.object.record.publishing.state = "draft"
input.object.record.publishing.state = "draft"
"ff603803-5b64-448f-bf90-a2c9bff5134e" = input.object.record["access-control"].orgUnitOwnerId
} Looking at the latest orgUnitOwnerId thing, we find two blocks for allow {
trace(input.object.allow)
input.object.record["dataset-draft"]. # isDraftDataset body 1
not input.object.record.publishing.state. # isDraftDataset body 1
input.object.record.publishing.state = "draft" # verifyDatasetPermission("object/dataset/draft/read", "record")
"ff603803-5b64-448f-bf90-a2c9bff5134e" = input.object.record["access-control"].orgUnitOwnerId # verifyRecordPermission
}
allow {
trace(input.object.allow)
input.object.record["dataset-draft"] # isDraftDataset body 2
input.object.record.publishing.state = "draft" # isDraftDataset body 2
input.object.record.publishing.state = "draft" # verifyDatasetPermission("object/dataset/draft/read", "record")
"ff603803-5b64-448f-bf90-a2c9bff5134e" = input.object.record["access-control"].orgUnitOwnerId # verifyRecordPermission
} I've annotated the expression with the rule body expression they're descendant from. So, duplicate rules are silly, but getting rid of them should also be a straightforward improvement to the PE code. I'll make a note of that. What's more interesting is the impossible case, i.e., the first rule body. PE doesn't keep track of what unknowns could be, so it won't be able to tell on its own that this is impossible. I wonder if by restructuring the rego code, we can tell it. I think we get the number down by replacing isDraftDataset {
input.object.record["dataset-draft"]
input.object.record.publishing.state = "draft"
}
isDraftDataset {
input.object.record["dataset-draft"]
not input.object.record.publishing.state
} with isDraftDataset {
input.object.record["dataset-draft"]
object.get(input.object.record, ["publishing", "state"], "draft") == "draft"
} but that'll make the I'm assuming you're reading the PE results and convert that to something else? I'm asking because if you were to evaluate them with OPA again, the rule indexing feature would have an impact -- but if you don't use that, it's not worth exploring. |
@srenatus Thanks so much for looking into it. Also appreciated the We actually use OPA to make decisions on data filtering. The policy will be partially evaluated with context data like user attributes etc. and "residual" rules (if any) will then be translated to either SQL or elasticsearch DSL (we store the same record in different "shapes" in both PostgreSQL & Elasticsearch for different purposes) to filter out the data that a user is not allowed to access. Right now, our implementation works correctly for all our use cases in terms of functionality/access control. But I guess the performance issue might be a blocker for us as it seems to get worse when we assign more permissions to the users. We have already implemented some logic to filter out duplicated rules from the compiled result in our codebases. For some requests, it reduces the AST parse / process time from 10s to 2s. I can tell we probably can still improve our parse further. However, as our parser is written in typescript, I guess the performance will be much better (and might be easier) if we can address some of the issues in OPA. In fact, instead of changing the OPA PE core logic/processing pipelines to address the issues, I was wondering if it would be possible to add some "post-processing logic" to address those issues just before outputting the PE result? Those "post-processing logic" could be simply string-comparison based rule/expression filtering as we probably don't have to eliminate 100% duplicated rules / expression --- even solving 80% of cases will likely be a big improvement. I re-ran the policy using https://gist.github.com/t83714/69f4aadaa6f2191455f5357a9bfe0879 I think there are 3 issues that we could possibly address with "post-processing logic":
The following rule appears twice at line 3849 and line 3856. Before OPA outputs the PE result, is it possible to filter out rules with the same rule body by simple string (e.g. AST JSON) comparison? allow {
input.object.record["dcat-dataset-strings"]
input.object.record.publishing.state = "published"
input.object.record.publishing.state = "published"
not input.object.record["access-control"].orgUnitId
}
Still take the rule above for instance: allow {
input.object.record["dcat-dataset-strings"]
input.object.record.publishing.state = "published"
input.object.record.publishing.state = "published"
not input.object.record["access-control"].orgUnitId
} The expression For all rules that OPA is about to output, could we possibly implement some simple "post-processing logic" to filter out the exact same expressions by simple string (e.g. AST JSON string) comparison?
Thanks for your Agree it's a tricky issue to solve at "PE evaluation engine"'s point of view. e.g. the rule below is found at line 3827: allow {
input.object.record["dcat-dataset-strings"]
not input.object.record["dataset-draft"]
not input.object.record.publishing.state = "draft"
not input.object.record["dcat-dataset-strings"]
not input.object.record.publishing.state = "published"
input.object.record.publishing.state
input.object.record.publishing.state = "published"
"ff603803-5b64-448f-bf90-a2c9bff5134e" = input.object.record["access-control"].orgUnitId
} The existence of expression could we possibly address this case by implementing a "post-processing logic" that:
This approach might not logically solve all cases. But I believe (and based on my tests) it will help most real-life cases. Thanks again for your time in looking into it. |
Some sort of post-processing logic in OPA's PE machinery would make sense, I think. Just need to figure out here the best place to put it would be 🤔 |
This issue has been automatically marked as inactive because it has not had any activity in the last 30 days. |
…sult body Before, we'd have added expressions multiple times. Also, it was possible to generate rule bodies that would never work out because they carried both expr and not-expr for the same expr. Now, we're checking these two things before adding an expression to the result in copypropagation. Fixes open-policy-agent#4516 for simple cases. It's still the case that we may end up with impossible bodies, like not input.foo input.foo = "bar" but at least we won't have input.foo.bar not input.foo.bar Signed-off-by: Stephan Renatus <stephan.renatus@gmail.com>
I've given this a go with some code on this branch and the number of |
@srenatus Thanks a lot for looking into it. I can confirm that I also got signification improvement on our latest policy files --- down from 284 to 112 rules (i.e. 60% improvement 🎉 ). Here is how I did my test:
Here are the output files: One thing I noticed --- although the result after fix achieved amazing result, it still contains some identical rules (e.g. first & second rules. Please see below). Not trying to be greedy 😄 . But would it be hard to filter out those identical rules as well? Here is one example of duplicated rules (duplicated twice): allow {
input.object.record["dcat-dataset-strings"]
not input.object.record.publishing.state
not input.object.record["access-control"].orgUnitId
} |
This issue has been automatically marked as inactive because it has not had any activity in the last 30 days. |
This issue has been automatically marked as inactive because it has not had any activity in the last 30 days. |
Partial Evaluation produce duplicate rules
"run", "--server", "/policies"
General Summary
The partial evaluation result might include many duplicate / identical rules sometimes.
The sample response provided above includes 432 rules. Many of those are duplicated / identical rules. e.g. like (I converted AST into a more readable format):
or
Those above are example rules that are duplicated more than once.
Steps To Reproduce
opa run -s [folder of policy files]
Expected behaviour
The partial evaluation result should not contain duplicate rules. But right now you can locate duplicate rules in the partial evaluation result.
The text was updated successfully, but these errors were encountered: