Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-deduplication of sets #429

Closed
timothyhinrichs opened this issue Aug 28, 2017 · 3 comments · Fixed by #3067
Closed

Non-deduplication of sets #429

timothyhinrichs opened this issue Aug 28, 2017 · 3 comments · Fixed by #3067
Assignees

Comments

@timothyhinrichs
Copy link
Member

When defining a set, we have two options: using a partial document or a set-comprehension. The two are logically equivalent without aggregation, but with aggregation you the two produce different results. The partial document only eliminates duplicates once you ask for the set as a whole at the end of the search; it leaves duplicates in tact during search. The result is that the same value can be included more than once in an aggregate. In contrast, the set-comprehension eliminates duplicates immediately.

The following two code snippets are identical (and are taken from the Terraform library).

# Set-comprehension
resource_types = all {
    all = {y | input[name]; split(name, ".", outs); y = outs[0]}
}
# Partial document
resource_types_aux[y] {
    input[name]
    split(name, ".", outs)
    y = outs[0]}
}
# Force duplicate elimination
resource_types = resource_types_aux

Two options for addressing.

  • Improve debuggability so such issues become apparent immediately. This may be difficult because it seems that the time of de-duplication is not something people will naturally think about.
  • Eliminate duplicates more proactively. People will use the 'partial-doc set' because it's syntactically easier than the set-comprehension, both to write and to debug. They probably won't think about it as controlling the de-duplication logic. OPA could analyze the policy to identify when de-duplication must occur proactively (since the caller is an aggregate). The downside is that if someone tries to debug by calling lower-level docs without using an aggregate, the results will be different when she calls the lower-level doc using an aggregate.
@tsandall
Copy link
Member

tsandall commented Oct 17, 2018

@timothyhinrichs do you have any other thoughts on this? I'm tempted to close for now as we this hasn't come up for quite a while and I'm not sure there is a good solution.

@tsandall
Copy link
Member

At minimum we need to clearly document the behaviour.

@timothyhinrichs
Copy link
Member Author

During evaluation, each partial set could record the elements found so far and fail when it encounters an element already in the set (perhaps with a better trace error message). Besides being more intuitive, this would also improve performance. Tracing will look different.

tylercloke added a commit to chef/automate that referenced this issue Oct 23, 2019
…ld result in *

This is an edge case where the duplicates would be the same length as the total projects in the system.

For example, if you have a single custom project named project1 one and two policies that allow user bob on project1 for a specific resource and action, then V2ProjectsAuthorized would return ["project1", "project1"]. Since the total projects in the system would be ["(unassigned)", "project1"], ProjectsAuthorized would incorrectly equate the result of V2ProjectsAuthorized to * due to the fact that their lengths matched. This edge case would happen anytime NumberOfProjectsAllowingAProjectForASpecificResourceAndAction = NumberOfCustomProjects + 1 (for the unassigned project).

V2ProjectsAuthorized has been updated to not return duplicates which was never the intended behavior. The duplicates were introduced in this change:

c21ddbe

Due to the introduction of using a partial document:

c21ddbe#diff-2d317bb33ff7babca7e1290b0fb1dcf2R241

Which unfortunately contain duplicates:

open-policy-agent/opa#429

Signed-off-by: Tyler Cloke <tylercloke@gmail.com>
tylercloke added a commit to chef/automate that referenced this issue Oct 23, 2019
…ld result in *

This is an edge case where the duplicates would be the same length as the total projects in the system.

For example, if you have a single custom project named project1 one and two policies that allow user bob on project1 for a specific resource and action, then V2ProjectsAuthorized would return ["project1", "project1"]. Since the total projects in the system would be ["(unassigned)", "project1"], ProjectsAuthorized would incorrectly equate the result of V2ProjectsAuthorized to * due to the fact that their lengths matched. This edge case would happen anytime NumberOfProjectsAllowingAProjectForASpecificResourceAndAction = NumberOfCustomProjects + 1 (for the unassigned project).

V2ProjectsAuthorized has been updated to not return duplicates which was never the intended behavior. The duplicates were introduced in this change:

c21ddbe

Due to the introduction of using a partial document:

c21ddbe#diff-2d317bb33ff7babca7e1290b0fb1dcf2R241

Which unfortunately contain duplicates:

open-policy-agent/opa#429

Signed-off-by: Tyler Cloke <tylercloke@gmail.com>
tylercloke added a commit to chef/automate that referenced this issue Oct 24, 2019
…ld result in *

This is an edge case where the duplicates would be the same length as the total projects in the system.

For example, if you have a single custom project named project1 one and two policies that allow user bob on project1 for a specific resource and action, then V2ProjectsAuthorized would return ["project1", "project1"]. Since the total projects in the system would be ["(unassigned)", "project1"], ProjectsAuthorized would incorrectly equate the result of V2ProjectsAuthorized to * due to the fact that their lengths matched. This edge case would happen anytime NumberOfProjectsAllowingAProjectForASpecificResourceAndAction = NumberOfCustomProjects + 1 (for the unassigned project).

V2ProjectsAuthorized has been updated to not return duplicates which was never the intended behavior. The duplicates were introduced in this change:

c21ddbe

Due to the introduction of using a partial document:

c21ddbe#diff-2d317bb33ff7babca7e1290b0fb1dcf2R241

Which unfortunately contain duplicates:

open-policy-agent/opa#429

Signed-off-by: Tyler Cloke <tylercloke@gmail.com>
tylercloke added a commit to chef/automate that referenced this issue Oct 24, 2019
…ld result in * (#1963)

This is an edge case where the duplicates would be the same length as the total projects in the system.

For example, if you have a single custom project named project1 one and two policies that allow user bob on project1 for a specific resource and action, then V2ProjectsAuthorized would return ["project1", "project1"]. Since the total projects in the system would be ["(unassigned)", "project1"], ProjectsAuthorized would incorrectly equate the result of V2ProjectsAuthorized to * due to the fact that their lengths matched. This edge case would happen anytime NumberOfProjectsAllowingAProjectForASpecificResourceAndAction = NumberOfCustomProjects + 1 (for the unassigned project).

V2ProjectsAuthorized has been updated to not return duplicates which was never the intended behavior. The duplicates were introduced in this change:

c21ddbe

Due to the introduction of using a partial document:

c21ddbe#diff-2d317bb33ff7babca7e1290b0fb1dcf2R241

Which unfortunately contain duplicates:

open-policy-agent/opa#429

Signed-off-by: Tyler Cloke <tylercloke@gmail.com>
@tsandall tsandall added this to TODO (Things That Should Be Done) in Open Policy Agent via automation Sep 3, 2020
@tsandall tsandall moved this from TODO (Things That Should Be Done) to Planned (Things We Are Going To Do) in Open Policy Agent Sep 3, 2020
@tsandall tsandall moved this from Planned (Things We Are Going To Do) to In Progress in Open Policy Agent Jan 11, 2021
@tsandall tsandall self-assigned this Jan 11, 2021
tsandall added a commit to tsandall/opa that referenced this issue Jan 15, 2021
This commit fixes an old issue in the evaluator whereby duplicate keys
could be returned when iterating over documents generated by partial
rules. Specifically, because the code path to evaluate partial sets
and partial objects differs when elements are accessed individually
versus in aggregate (e.g., `p[x] = y` vs `p = q; q[x] = y`) callers
could see different results in some cases. For example:

```
[[x,y] | p[x] = y]

[[x, y] | p = q; q[x] = y]
```

In this case, query 1 and 2 could produce different results. For both
partial sets and objects, the result may contain duplicate values. In
addition, for partial objects, the first query may succeed, while the
second query may generate a conflict error if the same key was mapped
more than one value.

The fix simply keeps track of the values that have been generated
while evaluating the rules and performs deduplication/conflict
checking just like in the aggregate case. When duplicates are
encountered, the evaluator will emit Duplicate events to indicate that
evaluation is not continuing.

Fixes open-policy-agent#429

Signed-off-by: Torin Sandall <torinsandall@gmail.com>
Open Policy Agent automation moved this from In Progress to Done Jan 19, 2021
tsandall added a commit that referenced this issue Jan 19, 2021
This commit fixes an old issue in the evaluator whereby duplicate keys
could be returned when iterating over documents generated by partial
rules. Specifically, because the code path to evaluate partial sets
and partial objects differs when elements are accessed individually
versus in aggregate (e.g., `p[x] = y` vs `p = q; q[x] = y`) callers
could see different results in some cases. For example:

```
[[x,y] | p[x] = y]

[[x, y] | p = q; q[x] = y]
```

In this case, query 1 and 2 could produce different results. For both
partial sets and objects, the result may contain duplicate values. In
addition, for partial objects, the first query may succeed, while the
second query may generate a conflict error if the same key was mapped
more than one value.

The fix simply keeps track of the values that have been generated
while evaluating the rules and performs deduplication/conflict
checking just like in the aggregate case. When duplicates are
encountered, the evaluator will emit Duplicate events to indicate that
evaluation is not continuing.

Fixes #429

Signed-off-by: Torin Sandall <torinsandall@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging a pull request may close this issue.

2 participants