Non-deduplication of sets #429

timothyhinrichs · 2017-08-28T23:19:48Z

When defining a set, we have two options: using a partial document or a set-comprehension. The two are logically equivalent without aggregation, but with aggregation you the two produce different results. The partial document only eliminates duplicates once you ask for the set as a whole at the end of the search; it leaves duplicates in tact during search. The result is that the same value can be included more than once in an aggregate. In contrast, the set-comprehension eliminates duplicates immediately.

The following two code snippets are identical (and are taken from the Terraform library).

# Set-comprehension
resource_types = all {
    all = {y | input[name]; split(name, ".", outs); y = outs[0]}
}

# Partial document
resource_types_aux[y] {
    input[name]
    split(name, ".", outs)
    y = outs[0]}
}
# Force duplicate elimination
resource_types = resource_types_aux

Two options for addressing.

Improve debuggability so such issues become apparent immediately. This may be difficult because it seems that the time of de-duplication is not something people will naturally think about.
Eliminate duplicates more proactively. People will use the 'partial-doc set' because it's syntactically easier than the set-comprehension, both to write and to debug. They probably won't think about it as controlling the de-duplication logic. OPA could analyze the policy to identify when de-duplication must occur proactively (since the caller is an aggregate). The downside is that if someone tries to debug by calling lower-level docs without using an aggregate, the results will be different when she calls the lower-level doc using an aggregate.

The text was updated successfully, but these errors were encountered:

tsandall · 2018-10-17T16:56:51Z

@timothyhinrichs do you have any other thoughts on this? I'm tempted to close for now as we this hasn't come up for quite a while and I'm not sure there is a good solution.

tsandall · 2019-04-12T19:19:48Z

At minimum we need to clearly document the behaviour.

timothyhinrichs · 2019-06-03T21:04:15Z

During evaluation, each partial set could record the elements found so far and fail when it encounters an element already in the set (perhaps with a better trace error message). Besides being more intuitive, this would also improve performance. Tracing will look different.

…ld result in * This is an edge case where the duplicates would be the same length as the total projects in the system. For example, if you have a single custom project named project1 one and two policies that allow user bob on project1 for a specific resource and action, then V2ProjectsAuthorized would return ["project1", "project1"]. Since the total projects in the system would be ["(unassigned)", "project1"], ProjectsAuthorized would incorrectly equate the result of V2ProjectsAuthorized to * due to the fact that their lengths matched. This edge case would happen anytime NumberOfProjectsAllowingAProjectForASpecificResourceAndAction = NumberOfCustomProjects + 1 (for the unassigned project). V2ProjectsAuthorized has been updated to not return duplicates which was never the intended behavior. The duplicates were introduced in this change: c21ddbe Due to the introduction of using a partial document: c21ddbe#diff-2d317bb33ff7babca7e1290b0fb1dcf2R241 Which unfortunately contain duplicates: open-policy-agent/opa#429 Signed-off-by: Tyler Cloke <tylercloke@gmail.com>

…ld result in * (#1963) This is an edge case where the duplicates would be the same length as the total projects in the system. For example, if you have a single custom project named project1 one and two policies that allow user bob on project1 for a specific resource and action, then V2ProjectsAuthorized would return ["project1", "project1"]. Since the total projects in the system would be ["(unassigned)", "project1"], ProjectsAuthorized would incorrectly equate the result of V2ProjectsAuthorized to * due to the fact that their lengths matched. This edge case would happen anytime NumberOfProjectsAllowingAProjectForASpecificResourceAndAction = NumberOfCustomProjects + 1 (for the unassigned project). V2ProjectsAuthorized has been updated to not return duplicates which was never the intended behavior. The duplicates were introduced in this change: c21ddbe Due to the introduction of using a partial document: c21ddbe#diff-2d317bb33ff7babca7e1290b0fb1dcf2R241 Which unfortunately contain duplicates: open-policy-agent/opa#429 Signed-off-by: Tyler Cloke <tylercloke@gmail.com>

This commit fixes an old issue in the evaluator whereby duplicate keys could be returned when iterating over documents generated by partial rules. Specifically, because the code path to evaluate partial sets and partial objects differs when elements are accessed individually versus in aggregate (e.g., `p[x] = y` vs `p = q; q[x] = y`) callers could see different results in some cases. For example: ``` [[x,y] | p[x] = y] [[x, y] | p = q; q[x] = y] ``` In this case, query 1 and 2 could produce different results. For both partial sets and objects, the result may contain duplicate values. In addition, for partial objects, the first query may succeed, while the second query may generate a conflict error if the same key was mapped more than one value. The fix simply keeps track of the values that have been generated while evaluating the rules and performs deduplication/conflict checking just like in the aggregate case. When duplicates are encountered, the evaluator will emit Duplicate events to indicate that evaluation is not continuing. Fixes open-policy-agent#429 Signed-off-by: Torin Sandall <torinsandall@gmail.com>

This commit fixes an old issue in the evaluator whereby duplicate keys could be returned when iterating over documents generated by partial rules. Specifically, because the code path to evaluate partial sets and partial objects differs when elements are accessed individually versus in aggregate (e.g., `p[x] = y` vs `p = q; q[x] = y`) callers could see different results in some cases. For example: ``` [[x,y] | p[x] = y] [[x, y] | p = q; q[x] = y] ``` In this case, query 1 and 2 could produce different results. For both partial sets and objects, the result may contain duplicate values. In addition, for partial objects, the first query may succeed, while the second query may generate a conflict error if the same key was mapped more than one value. The fix simply keeps track of the values that have been generated while evaluating the rules and performs deduplication/conflict checking just like in the aggregate case. When duplicates are encountered, the evaluator will emit Duplicate events to indicate that evaluation is not continuing. Fixes #429 Signed-off-by: Torin Sandall <torinsandall@gmail.com>

tsandall added the design label Aug 29, 2017

tsandall added the pre1.0 label Apr 12, 2019

tsandall added docs and removed design labels Apr 12, 2019

tsandall assigned timothyhinrichs Apr 12, 2019

tylercloke mentioned this issue Oct 23, 2019

Fix bug where duplicate projects getting returned from OPA in 2.1 could result in * chef/automate#1963

Merged

2 tasks

tsandall added this to TODO (Things That Should Be Done) in Open Policy Agent via automation Sep 3, 2020

tsandall moved this from TODO (Things That Should Be Done) to Planned (Things We Are Going To Do) in Open Policy Agent Sep 3, 2020

tsandall unassigned timothyhinrichs Sep 3, 2020

tsandall moved this from Planned (Things We Are Going To Do) to In Progress in Open Policy Agent Jan 11, 2021

tsandall self-assigned this Jan 11, 2021

tsandall mentioned this issue Jan 14, 2021

topdown: Deduplicate when iterating partial rules #3067

Merged

tsandall closed this as completed in #3067 Jan 19, 2021

Open Policy Agent automation moved this from In Progress to Done Jan 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-deduplication of sets #429

Non-deduplication of sets #429

timothyhinrichs commented Aug 28, 2017

tsandall commented Oct 17, 2018 •

edited

tsandall commented Apr 12, 2019

timothyhinrichs commented Jun 3, 2019

Non-deduplication of sets #429

Non-deduplication of sets #429

Comments

timothyhinrichs commented Aug 28, 2017

tsandall commented Oct 17, 2018 • edited

tsandall commented Apr 12, 2019

timothyhinrichs commented Jun 3, 2019

tsandall commented Oct 17, 2018 •

edited