Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reimplement foreign audit handling in resolver #240

Closed
Gankra opened this issue Jun 29, 2022 · 4 comments · Fixed by #273
Closed

reimplement foreign audit handling in resolver #240

Gankra opened this issue Jun 29, 2022 · 4 comments · Fixed by #273
Labels
p1 do now

Comments

@Gankra
Copy link
Contributor

Gankra commented Jun 29, 2022

this code was hacked up in like week 1 and hasn't been touched since.

The main challenge to solve is criteria mapping. Foreign criteria are namespaced, but can be mapped into strict equivalences with local criteria (implies but in both directions). The two builtins should automap, but the user can also specify custom M:N mappings. How to apply that mapping efficiently is a bit of a concern for me. As far as I can tell, you need to do a quadratic amount of work in the worst-case to apply the mappings.

Consider the case of only 2:1 mappings, ([x] is set, [ ] is unset):

mapping 1: [ ][ ][ ][x][x] [ ][ ][ ][ ][ ] == [ ][ ][ ][ ][ ] [x][ ][ ][ ][ ]
mapping 2: [ ][ ][x][ ][ ] [x][ ][ ][ ][ ] == [ ][ ][ ][ ][ ] [ ][x][ ][ ][ ]
mapping 3: [ ][x][ ][ ][ ] [ ][x][ ][ ][ ] == [ ][ ][ ][ ][ ] [ ][ ][x][ ][ ]
mapping 4: [x][ ][ ][ ][ ] [ ][ ][x][ ][ ] == [ ][ ][ ][ ][ ] [ ][ ][ ][x][ ]

Then with this input, we have a cascading series of equalities:

[x][x][x][x][x] [ ][ ][ ][ ][ ]
[x][x][x][x][x] [x][ ][ ][ ][ ]  via mapping 1
[x][x][x][x][x] [x][x][ ][ ][ ]  via mapping 2
[x][x][x][x][x] [x][x][x][ ][ ]  via mapping 3
[x][x][x][x][x] [x][x][x][x][ ]  via mapping 4

At each step we have one criteria that wasn't involved before, and one criteria that only came into existence in the last step. I believe that this shows that you cannot simply do as we currently do with "implies" where you compute some kind of transitive closure ahead of time. Nor can you do a random-access lookup against your currently set criteria for what rules definitely apply.

Seemingly this means that you need to linearly scan over every mapping rule in order and apply them. Then you just need to do it all over again until you hit a fixed-point. Such a process would be O(k * j) with k criteria and j rules. The process will definitely terminate at this point as we are always making progress on each linear scan. Note that this cost would have to be paid for every union_with or set_criteria we do. So if we do g of those, we're talking about O(k * j * g).

One possible way to optimize this would be to have a mapping from individual criteria to all the rules that refer to them. So whenever we apply a rule, we can know what criteria are newly set and only check mappings that involve those criteria. Such a mapping would have O(k * j) size worst-case, and therefore take just as long to compute, but that would be a fixed cost instead of a cost we pay on every operation. Sadly I think you could still tune such an input such that the each entry has O(j) rules so this doesn't gain much in theory. In practice it might gain a lot, as most rules will only refer to a few criteria, and we will go from having to naively check every rule to knowing exactly what rules to check.

This is probably overthinking things but the resolver is a combinatoric time bomb, so shaving off multiplicative factors is worth some consideration. Also I have insomnia and apparently that means algorithms brain.

@Gankra Gankra added the p1 do now label Jun 29, 2022
@Gankra
Copy link
Contributor Author

Gankra commented Jun 29, 2022

See also #16 and #4. Those discussions were a bit hairy because we had such a rough understanding of how the resolver would work and the semantics of criteria. These days I'm pretty confident in the semantics and this seems to make sense.

In the past I was thrown off by wanting to map foreign audit entries into local criteria only, but this proves problematic for dependency-criteria and unmapped criteria. You're punching a lot of semantic holes in their audits and playing with things that look a lot like "variance" on functions with types in the outputs (criteria) and inputs (dependency-criteria), although I think the variance stuff is mostly resolved by making all the mappings "strict equality" instead of only one-way.

I am now of the opinion that it's much simpler to just have all local and foreign criteria live at once, to reapply all equality rules at every step, and to evaluate foreign audits in their native criteria. The above comment is one issue that comes from doing this. The other issue is that the set of criteria we have to track can grow much faster, blowing the budget on CriteriaSet's current static allocation of 64 bits. We may need to move to a hybrid where we can fallback to bitvec. The code has always anticipated this, which is why CriteriaSets aren't Copy and need to be explicitly initialized with their "capacity" (number of criteria). Still I'd like to avoid doing that for as long as we can.

@bholley
Copy link
Collaborator

bholley commented Jun 29, 2022

Would it simplify things if we only allowed 1:1 mappings for now? I'm not sure how widely custom criteria maps will be used, and so I don't want to put too much engineering effort into them before we have a clearer sense of the use-cases.

@Gankra
Copy link
Contributor Author

Gankra commented Jun 29, 2022

Yeah it it's always 1:1 we can do some transitive closure stuff again and use the implies machinery.

@bholley
Copy link
Collaborator

bholley commented Jun 29, 2022

Let's do that for now and then see if it's an issue in practice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p1 do now
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants