Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: support batch authorization requests #13

Closed
glb opened this issue Jul 10, 2023 · 6 comments
Closed

RFC: support batch authorization requests #13

glb opened this issue Jul 10, 2023 · 6 comments
Assignees

Comments

@glb
Copy link
Contributor

glb commented Jul 10, 2023

While the vast majority of requests likely operate on a single resource and can be handled efficiently by the current implementation, depending on the permission model that an application has list operations may need to check authorization against multiple resources and it is inefficient to do a full Python-to-Rust roundtrip for each check.

This is a proposal to modify the is_authorized signature from:

def is_authorized(request: dict,
                  policies: str,
                  entities: Union[str, List[dict]],
                  schema: Union[str, dict, None] = None,
                  verbose: bool = False) -> AuthzResult:
    ....

where request looks like

{
  "principal": "...entity ID",
  "action": "... action entity ID",
  "resource": "... resource entity ID",
  "context": [JSON-encoded dict or a dict]
}

to take a list of resource entity IDs and return a list of AuthzResult.

An alternative approach suggested by @skuenzli that would maintain backwards compatibility would be to create a new is_batch_authorized or is_authorized_batch (inspired by is_authorized_partial in the existing Cedar API) that takes a list of requests and returns a list of results. To reduce code duplication, internally both functions could call the same underlying Rust implementation. This alternative would also allow more flexibility for the caller to provide alternative principal / action / resource / context in each request. A potential downside of providing a batch of requests is that the context content would need to be provided for each, possibly making the amount of data sent from Python to Rust much larger.

With a naive implementation, the batch call in the test below is about 5x faster on my test machine and is able to perform a batch of 1000 authorization checks in ~134ms, compared to 688ms for 1000 individual calls. A brief test with more entities shows that the performance improvement only gets better.

#entities individual
1000 calls / 1 resource
batch
1 call / 1000 resources
speedup
2 688ms 134ms 5.13x
4 915ms 167ms 5.48x
8 1315ms 188ms 6.99x
Naive test script
import json
import cedarpy

from timeit import default_timer as timer

from typing import List

policies: str = """
permit(
    principal,
    action == Action::"edit",
    resource
)
when {
    resource.owner == principal
};
"""

entities: List[dict] = [
    {
        "uid": { "__expr": "User::\"alice\""},
        "attrs": {},
        "parents": [],
    },
    {
        "uid": { "__expr": "Photo::\"alices-photo-1\""},
        "attrs": {
            "owner": { "__expr": "User::\"alice\"" }
        },
        "parents": []
    },
    {
        "uid": { "__expr": "Photo::\"bobs-photo-1\""},
        "attrs": {
            "owner": { "__expr": "User::\"bob\"" }
        },
        "parents": []
    }
]

request = {
    "principal": "User::\"alice\"",
    "action": "Action::\"edit\"",
    "resource": "Photo::\"alices-photo-1\"",
    "context": json.dumps({})
}

print("starting")
start = timer()

# make 1000 individual calls to validate alice's access to `alices-photo-1` and `bobs-photo-1`
for i in range(0,500):
    result = cedarpy.is_authorized(
        {"principal":"User::\"alice\"",
        "action":"Action::\"edit\"",
        "resource":"Photo::\"alices-photo-1\"",
        "context":{}},
        policies, json.dumps(entities), None, False)
    result = cedarpy.is_authorized(
        {"principal":"User::\"alice\"",
        "action":"Action::\"edit\"",
        "resource":"Photo::\"bobs-photo-1\"",
        "context":{}},
        policies, json.dumps(entities), None, False)

end = timer()
print("done")

print(end-start, "seconds")
# 0.6880448180017993 seconds

start = timer()

# make 1 batched call with 1000 resources to validate alice's access to `alices-photo-1` and `bobs-photo-1`
result = cedarpy.is_authorized_batch(
    "User::\"alice\"",
    "Action::\"edit\"",
    ["Photo::\"alices-photo-1\"", "Photo::\"bobs-photo-1\""]*500,
    {},
    policies, entities, None, False)

end = timer()
print("done")

print(end-start, "seconds")
# 0.13395986999967135 seconds
@skuenzli
Copy link
Contributor

Yes, I think there should be an is_authorized signature that accepts a batch of requests.

I don't know if the primary overhead is FFI, parsing, or something else yet, but the Cedar authorizer.is_authorized execution time in Rust is dwarfed by what it takes to get there and back.

From some internal unit tests:

total authz time (micros) (Python): 45473 (Rust) 345

The time observed in Python is consistently 2 orders of magnitude greater than the authorizer.is_authorized call.
These particular requests use a simple policy, a simple schema, and ~170 entities

As far as an actual batch impl goes, I think we should identify key use cases and see how the requests vary. I think there's a good chance we'll need to provide a way for people to provide a list of requests while keeping their policies and entities static.

In my own access analysis use cases, we need build requests that vary actions (primarily) and resources (might be a nice to have).

Another fun bit is that batching means you'll need to correlate the request in the input to the AuthzResult in the output.

@skuenzli
Copy link
Contributor

Craig Disselkoen (Cedar Team) suggested we may want to offer methods to load and return opaque PolicySet and Entities objects so that we can cache them across is_authorized calls.

I think this is potentially a really good idea based on my first encounter with larger sets of entities.

However it really drove home that we really don't know how is_authorized inputs drive execution time. So I've entered #14 to find out.

@skuenzli
Copy link
Contributor

skuenzli commented Jul 28, 2023

I extracted a sample of is_authorized execution metrics from production today.

Here are summary statistics for 1000 executions of is_authorized in microseconds:

Summary Stats (micros) average p50 p90 p99 max
total_duration 4685 442 16471 64434 81317
parse_policies_duration 3375 132 13155 61384 77503
parse_schema_duration 348 109 131 12252 16257
load_entities_duration 461 108 128 16240 16522
authz_duration 289 15 168 13002 18049

For our workload, is_authorized execution time is dominated by parsing policies. Now I think our workload might be a bit unusual for Cedar/AVP. We're generating policies and schemas, not crafting them by hand. It's almost certain that there are cases where we should generate 'better' policies. (We already implemented a change to minimize the entities to ~10 and that worked well.) But it's data.

Here's a scatterplot with each request's total and components plotted in sequence:
cedarpy-is_authorized-metrics sample-2023-07-28 1

My main takeaway from looking at these (and other) metrics is that if we can amortize the cost of parsing policies and building the entityset+schema, we should be able to improve the performance of an individual is_authorized request by 3x and sometimes more than 10x.

For our workload, we generally use a given policyset, schema, and entities for more than 30 is_authorized requests.

Currently that costs 140,550 micros (30*4685).

But if we only built the policies, schema and entities once, that'd be: 12,883 micros (3375+348+461+(30*289))

@skuenzli
Copy link
Contributor

Implemented is_authorized_batch in #17

@skuenzli
Copy link
Contributor

Merged initial is_authorized_batch implementation from #17 .

Will keep this open until (at least) we adopt the batch API internally. That will help discover whether we need an explicit request-response correlation mechanism.

@skuenzli
Copy link
Contributor

When integrating is_authorized_batch into our own codebase, we really wanted a way to explicitly correlate authz request to result. So added an optional correlation_id in #18.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants