Client side claims evaluation (#420)

* Initial draft of some basic documentation. * First draft of ADR for client side claims evaluation. * Summary of the different projects * Solution folder reorg * Added notes on glob tokenization. * Added section on caching ResourceAccessRulesets. * Documentation tweaks * Reverted accidental package.lock changes * Fix typo * Fix typo
marain-dotnet · Dec 20, 2022 · 4bf86c0 · 4bf86c0
1 parent 2bd867d
commit 4bf86c0
Show file tree

Hide file tree

Showing 5 changed files with 411 additions and 8 deletions.
diff --git a/Documentation/ADRs/0001-client-side-claims-evaluation.md b/Documentation/ADRs/0001-client-side-claims-evaluation.md
@@ -0,0 +1,157 @@
+# Implementation of client-side Claims Evaluation
+
+## Status
+
+Agreed
+
+## Background
+
+Since the initial implementation of the current incarnation of the Marain Claims API, we have had issues with performance. A common use case we encounter when building hypermedia APIs requires evaluating two sets of claims per API request:
+- On receiving a request, to ensure that the current user has permission to access the endpoint they have requested
+- When returning a HAL document, to strip out any links or embedded documents that the user does not have permission to call
+
+The first implementation of the API required one call per claim evaluation. When link-stripping from a HAL document, it is often necessary to evaluate many (in some cases, up to several hundred) claims at once, and performance with the call-per-evaluation model was predictably poor.
+
+To partially remediate this, a batch evaluation endpoint was added to the API. This allowed clients to POST a batch of evaluation requests to the API and receive a single response containing the results of all evaluations. This has made the API more usable, but performance is still sub-optimal.
+
+The nature of the Claims API is that the data it stores is slowly changing. As a result, a more optimal model is for the client library to retrieve `ClaimPermissions` documents in their entirety from the Claims API and to perform the claim evaluation on the client side. 
+
+## Options
+
+There are several options for implementing this new functionality. The following have been considered.
+
+### 1. Implement behind the existing `IClaimsService` interface, replacing the existing AutoRest-generated library
+
+In this option we would rewrite the existing Marain.Claims.Client library, keeping the existing `IClaimsService` interface to avoid breaking compatibility with existing consumers.
+
+#### Advantages
+- Clients could take advantage of the new features without needing to change their existing code.
+
+#### Disadvantages
+- Would remain tied to the existing interface.
+- No way of knowing whether the new functionality would be a breaking change for some clients since it is very different from the existing implementation.
+
+### 2. Implement behind the existing `IClaimsService` interface as a wrapper for the existing AutoRest-generated library
+
+In this option, we would provide a new implementation of `IClaimsService` which consumers could opt into using. This would likely wrap the existing auto-generated `ClaimsService`.
+
+#### Advantages
+- Clients could take advantage of the new features with minimal changes to  existing code.
+
+#### Disadvantages
+- Would remain tied to the existing interface.
+- Various methods on the existing interface are irrelevent to the client side evaluation functionality. However, the fact the our new functionality shares an interface with methods to add, update and delete the underlying data would likely mean there was an expectation that any caching functionality we implement would handle changes to the underlying data, increasing the complexity of the implementation.
+
+### 3. Implement as an independent feature built on top of the existing `IClaimsService`
+
+In this option we'd define a new interface for our client side evaluation library, and implement it independently of the existing interfaces. We would still depend on the existing `IClaimsService` implementation to communicate with the API.
+
+### Advantages
+- Not constrained by existing `IClaimsService` interface
+
+### Disadvantages
+- Clients would need to make code changes to opt into the new functionality
+
+## Decision
+
+Option 3 will be implemented.
+
+As part of this, we will implement a new `IResourceAccessEvaluator` which uses our new local-evaluation client. We'll provide additional extension methods in `OpenApiClaimsServiceCollectionExtensions` and `OpenApiAccessControlServiceCollectionExtensions` to allow configuring Menes and Marain Claims integration with the new features, meaning that many use cases will only need to change a single line of code to take advantage of the new functionality.
+
+## Implementation notes
+
+This section records decisions that were made during implementation.
+
+### Tokenization of URI and AccessTypes
+
+Question: Should we tokenize all of the URIs and AccessTypes in `ResourceAccessRules` at the point we retrieve a set of `ClaimPermissions` from the API, or do we allow the globbing library to tokenize on each evaluation.
+
+Tokenization is part of the glob matching process, but we have the choice of doing this up front and storing the tokenization data alongside the globs, or allowing the globbing library to tokenize globs during the evaluation.
+
+Both approaches were coded and benchmarked with the following results. Note: in the following tables, "WithTokenization" means using the code paths where URI and AccessMethod globs are tokenized up front.
+
+Firstly, the impact on processing the retrieved `ClaimPermissions` document. As expected, tokenization up front takes longer and results in more memory allocation. Note that no attempt was made at this point to optimize memory allocation.
+
+|                                     Method |      Mean |     Error |    StdDev |   Gen0 |   Gen1 | Allocated |
+|------------------------------------------- |----------:|----------:|----------:|-------:|-------:|----------:|
+|    ProcessClaimPermissionsWithTokenization | 11.841 us | 0.2002 us | 0.1872 us | 1.3580 | 0.0458 |  22.33 KB |
+| ProcessClaimPermissionsWithoutTokenization |  5.181 us | 0.0798 us | 0.0747 us | 0.5951 | 0.0076 |   9.75 KB |
+
+Secondly, the impact on claim evaluation. Again, as expected, evaluation is faster when the globs have already been tokenized.
+
+|                                           Method |     Mean |     Error |    StdDev |   Gen0 | Allocated |
+|------------------------------------------------- |---------:|----------:|----------:|-------:|----------:|
+|    EvaluateSingleClaimPermissionWithTokenization | 2.195 us | 0.0221 us | 0.0207 us | 0.0153 |     312 B |
+| EvaluateSingleClaimPermissionWithoutTokenization | 4.711 us | 0.0461 us | 0.0432 us | 0.0153 |     312 B |
+
+Based on these results, we decided to stick with up-front tokenization.
+
+### Caching processed `ResourceAccessRuleset`s for reuse
+
+Question: Since `ResourceAccessRuleset`s are often reused in multiple `ClaimPermissions`, should we keep a cache of the processed rulesets from each `ClaimPermissions` and reuse these for any future `ClaimPermissions` that reference the same rulesets?
+
+This essentially comes down to determining whether the cost of attempting to look up every ruleset in the cache before processing it is greater than the cost of processing/tokenizing all of the rules in the `ResourceAccessRuleset` again.
+
+The following benchmarks show the impact of this caching when processing the same `ClaimPermissions` twice - meaning that for the second `ClaimPermissions`, all of the `ResourceAccessRuleset`s will be in the cache.
+
+|                                               Method |     Mean |    Error |   StdDev | Allocated |
+|----------------------------------------------------- |---------:|---------:|---------:|----------:|
+|   ProcessSingleClaimPermissionsWithoutRulesetCaching | 29.83 us | 0.590 us | 0.808 us |  23.05 KB |
+| ProcessMultipleClaimPermissionsWithoutRulesetCaching | 49.75 us | 0.993 us | 1.104 us |  45.39 KB |
+|      ProcessSingleClaimPermissionsWithRulesetCaching | 41.74 us | 0.812 us | 1.056 us |  38.64 KB |
+|    ProcessMultipleClaimPermissionsWithRulesetCaching | 53.28 us | 1.012 us | 1.125 us |   46.5 KB |
+
+In this case we can see that loading the second `ClaimPermissions` takes around 20 us without caching, and 12 us with caching. However, we pay a penalty of around 10us to load the first `ClaimPermissions` with caching enabled, which is due to the overhead in looking up each `ResourceAccessRuleset` in the cache.
+
+This benchmark shows the same statistics but with the second `ClaimPermissions` being different from the first, with around 50% of the rules overlapping.
+
+|                                               Method |     Mean |    Error |   StdDev | Allocated |
+|----------------------------------------------------- |---------:|---------:|---------:|----------:|
+|   ProcessSingleClaimPermissionsWithoutRulesetCaching | 28.75 us | 0.571 us | 0.906 us |  23.05 KB |
+| ProcessMultipleClaimPermissionsWithoutRulesetCaching | 44.97 us | 0.871 us | 1.003 us |  41.02 KB |
+|      ProcessSingleClaimPermissionsWithRulesetCaching | 40.46 us | 0.806 us | 0.960 us |  37.66 KB |
+|    ProcessMultipleClaimPermissionsWithRulesetCaching | 50.71 us | 0.990 us | 1.252 us |  44.94 KB |
+
+In this case, we see the same ~10us penalty when loading the first `ClaimPermissions` with caching enabled. Loading the second takes around 16us without caching and about 10 us with caching.
+
+However, there is an additional complexity that caching `ResourceAccessRuleset`s will introduce. `ResourceAccessRuleset`s are entities that exist independently of the `ClaimPermissions` they are used by. The API denormalizes them into the `ClaimPermissions` document to save the need for making separate requests for the rulesets. However, this means that when multiple `ClaimPermissions` that include the same `ResourceAccessRuleset` are retrieved, it would be possible for the `ResourceAccessRuleset`s to be different versions of the same ruleset.
+
+In order to avoid dealing with this complexity (at least for the first iteration of this feature), we will accept the relatively minor impact in processing time and not cache the rulesets.
+
+### Choice of cache
+
+Question: Should we use the .NET standard `MemoryCache` to store `ClaimPermission` documents, or implement our own using a `ConcurrentDictionary` (or similar).
+
+|                                                    Method |     Mean |     Error |    StdDev |   Gen0 | Allocated |
+|---------------------------------------------------------- |---------:|----------:|----------:|-------:|----------:|
+|                             EvaluateSingleClaimPermission | 2.174 us | 0.0135 us | 0.0126 us | 0.0153 |     312 B |
+|              EvaluateSingleClaimPermissionWithMemoryCache | 2.178 us | 0.0150 us | 0.0141 us | 0.0153 |     312 B |
+
+Since the different in performance is negligible, the standard MemoryCache will be used.
+
+### Best approach to evaluating a batch of claims
+
+Question: Should we evaluate batches of claims in series or parallel? Given we know glob evaluation is highly performant, it may be that the overhead of parallelisation costs more than the evaluation itself.
+
+For a small (~35) batch of evaluations:
+
+|                                     Method |      Mean |     Error |    StdDev |   Gen0 | Allocated |
+|------------------------------------------- |----------:|----------:|----------:|-------:|----------:|
+|   EvaluateMultipleClaimPermissionsInSeries | 47.415 us | 0.3260 us | 0.3049 us | 0.8545 |   15120 B |
+| EvaluateMultipleClaimPermissionsInParallel | 49.658 us | 0.2860 us | 0.2675 us | 0.9766 |   16832 B |
+
+For a "realistically large" batch of 280 evaluations:
+
+|                                     Method |       Mean |     Error |    StdDev |   Gen0 | Allocated |
+|------------------------------------------- |-----------:|----------:|----------:|-------:|----------:|
+|   EvaluateMultipleClaimPermissionsInSeries | 382.955 us | 1.9772 us | 1.7527 us | 6.8359 |  120960 B |
+| EvaluateMultipleClaimPermissionsInParallel | 386.091 us | 2.4062 us | 2.1331 us | 7.8125 |  132112 B |
+
+And for an "unrealistically large" batch of  7000 evaluations:
+
+|                                     Method |         Mean |      Error |     StdDev |     Gen0 |    Gen1 | Allocated |
+|------------------------------------------- |-------------:|-----------:|-----------:|---------:|--------:|----------:|
+|   EvaluateMultipleClaimPermissionsInSeries | 9,575.074 us | 65.3881 us | 57.9649 us | 171.8750 |       - | 3024010 B |
+| EvaluateMultipleClaimPermissionsInParallel | 9,785.352 us | 86.6832 us | 81.0835 us | 187.5000 | 31.2500 | 3218618 B |
+
+We can see from this that parallel execution does not provide any benefits and actually adds to the evaluation time for the batch.
diff --git a/Documentation/Concepts/DataStructures.md b/Documentation/Concepts/DataStructures.md
@@ -0,0 +1,164 @@
+# Data Structures in Marain Claims
+
+## Overview
+
+There are three main data structures in Marain Claims:
+
+1. `ResourceAccessRule`
+2. `ResourceAccessRuleSet`
+3. `ClaimPermissions`
+
+`ClaimPermissions` and `ResourceAccessRuleSet` are the two top level entities; `ResourceAccessRule` items can only exist as part of one of these two entities. In addition, `ClaimPermissions` can also contain `ResourceAccessRuleSet`s.
+
+## `ResourceAccessRule`
+
+The ResourceAccessRule is the lowest level entity. It represents a permission for a specific resource or resources, and for a specific access type. The resource or resources to which it relates are defined by a URI; this supports globbing so, by using wildcards, it's possible to define a `ResourceAccessRule` which applies to multiple resources.
+
+It also supports an access type; the meaning of this is determined by the consuming application. An HTTP-based API would be likely to use the standard HTTP verbs (e.g. GET, POST, DELETE) as the different access types, but consumers can define their own set of access types if required.
+
+Finally, it contains a flag indicating whether this rule allows or denies access to the specified resource. There's more details on how this is expected to be used in the Claim Evaluation document.
+
+An example of a `ResourceAccessRule` for an API which manages employee data is shown below:
+
+```json
+{
+    "accessType": "GET",
+    "resource": {
+        "uri": "api/employees/*/contacts",
+        "displayName": "GET employee contacts"
+    },
+    "permission": "allow"
+}
+```
+
+This rule can be interpreted as allowing anyone it's assigned to (see below) to access contact information for all employees (due to the `*` in the URI).
+
+A similar rule in the same API could be:
+
+```json
+{
+    "accessType": "POST",
+    "resource": {
+        "uri": "api/employees/*/contacts",
+        "displayName": "POST a new employee contact"
+    },
+    "permission": "deny"
+}
+```
+
+This rule explicitly denies access to create new employee contact details.
+
+## `ResourceAccessRuleSet`
+
+The `ResourceAccessRuleSet` groups one or more `ResourceAccessRule`s for ease of use. The exact way in which the rules are grouped is determined by the consuming application; at its simplest a `ResourceAccessRuleSet` might contain a single `ResourceAccessRule`, but in a more complex application it could contain far more.
+
+A single `ResourceAccessRuleSet` can be referenced by multiple `ClaimPermissions`.
+
+The below example builds on those above and shows two rules grouped together which between them would allow access to add a new contact to an employee, or to update an existing one:
+
+```json
+{
+    "contentType": "application/vnd.marain.claims.resourceaccessruleset",
+    "id": "employeeAddOrUpdateContact",
+    "displayName": "Add or update contact information for an employee",
+    "rules": [
+        {
+            "accessType": "POST",
+            "resource": {
+                "uri": "api/employees/*/contacts",
+                "displayName": "POST new employee contact"
+            },
+            "permission": "allow"
+        },
+        {
+            "accessType": "POST",
+            "resource": {
+                "uri": "api/employees/*/contacts/*",
+                "displayName": "POST employee contact update"
+            },
+            "permission": "allow"
+        }
+    ]
+}
+```
+
+This shows how multiple rules can be grouped into sets of logically related functionality.
+
+## `ClaimPermissions`
+
+The `ClaimPermissions` data type is used to group a set of `ResourceAccessRuleSet` and/or `ResourceAccessRule` together into a set of permissions which are granted to identites holding a particular claim. For example, if the consuming application implemented access control based on Azure AD application role membership, it would define a ClaimPermissions for each role.
+
+When `ClaimPermissions` which refer to `ResourceAccessRuleSet`s are first created, they look like this:
+
+```json
+{
+    "contentType": "application/vnd.marain.claims.claimpermissions",
+    "id": "Manager",
+    "resourceAccessRules": [],
+    "resourceAccessRuleSets": [
+        {
+            "contentType": "application/vnd.marain.claims.resourceaccessruleset",
+            "id": "employeeAddOrUpdateContact",
+            "rules": []
+        },
+        {
+            "contentType": "application/vnd.marain.claims.resourceaccessruleset",
+            "id": "employeeAddOrUpdateRole",
+            "rules": []
+        },
+```
+
+When an implementation of `IClaimPermissionStore` is used to retrieve a `ClaimPermissions`, it is expected that it will dereference any `ResourceAccessRuleSet` it contains, so the resulting document looks more like this:
+
+```json
+{
+    "contentType": "application/vnd.marain.claims.claimpermissions",
+    "id": "Manager",
+    "resourceAccessRules": [],
+    "resourceAccessRuleSets": [
+    {
+        "contentType": "application/vnd.marain.claims.resourceaccessruleset",
+        "id": "employeeAddOrUpdateContact",
+        "eTag": "\"0x8D819BF7E13D377\"",
+        "displayName": "Add or update contact information for an employee",
+        "rules": [
+            {
+                "accessType": "POST",
+                "resource": {
+                    "uri": "api/employees/*/contacts",
+                    "displayName": "POST new employee contact"
+                },
+                "permission": "allow"
+            },
+            {
+                "accessType": "POST",
+                "resource": {
+                    "uri": "api/employees/*/contacts/*",
+                    "displayName": "POST employee contact update"
+                },
+                "permission": "allow"
+            }
+        ]
+    }
+    {
+        "contentType": "application/vnd.marain.claims.resourceaccessruleset",
+        "id": "employeeAddOrUpdateRole",
+        "eTag": "\"0x8D819BF7E4FB2A5\"",
+        "displayName": "Add or update role information for an employee",
+        "rules": [
+            {
+                "accessType": "POST",
+                "resource": {
+                    "uri": "api/employees/*/role",
+                    "displayName": "POST new employee contact"
+                },
+                "permission": "allow"
+            }
+        ]
+    }]
+}
+```
+
+The underlying store can choose to persist the fully expanded `ClaimPermissions` document back to the store, or re-expand it on every request. If it persists the expanded version back, it should transparently provide a mechanism for periodically ensuring the expanded `ResourceAccessRuleset` data is up to date.
+
+**WARNING**: The current implementation of `IClaimPermissionStore` does persist the expanded `ClaimPermissions` document back to the store, but does not then ensure that the expanded data remains consistent.